 
[NBS15] Clusterwise multiblock PLSConférence Internationale avec comité de lecture : CARME 2015, September 2015, pp.58, Naples, Italie,Mots clés: clusterwise regression, multiblock regression, PLS
Résumé:
Clusterwise or typological regression methods aim at partitioning data sets into clusters characterized by their specific coefficients in a regression model. These methods consist of simultaneous clustering of the data and finding local regression models associated with the clusters. The usual clusterwise linear regression consists of linearly relating one single response variable to a set of independent variables. When there is an underlying (but unknown) group structure of observations, one single model for the whole dataset is not realistic or adequate. In such cases clusterwise regression fits the data much better, but in practice, however, the standard clusterwise regression models proposed in the literature usually fail if, during the clustering process, the number of observations in a cluster is smaller than the number of independent variables. In order to circumvent this problem, clusterwise regression has been extended to PLS regression because this method can easily handle highly correlated independent variables.
In this paper, we present clusterwise multiblock PLS: an extension of clusterwise PLS regression to multiresponse variables and independent variables organized in blocks. This new method provides a partition of the data by minimizing the sum of the cluster prediction errors. This partition combines the description of the independent variables with the prediction of the set of response variables. Each cluster of the partition is associated with its own PLS model (e.g., components, set of coefficients), which is then used to improve the overall fit of the prediction step. To do so, a new observation is first assigned to the relevant cluster if this assignment minimizes a specific distance measure or maximizes the class membership probability. The prediction is then performed using the local model of the cluster to which the observation belongs. In addition, model averaging strategies, based on combining the local cluster predictions through weighted averaging, can also be used.
As our strategy is general and based on a clear criterion to minimize, the proposed approach can be directly extended to other multiblock regression methods. The properties of clusterwise multiblock PLS will be evaluated with a simulation study and will also be illustrated with a real data example.
Commentaires:
7th conference on Correspondence Analysis and Related Methods, 2023 septembre
Equipe:
msdma
Collaboration:
ANSES
BibTeX

