# [ABS15] Generalizing partial least squares and correspondence analysis methods to predict categorical and heterogeneous data

**Conférence Internationale avec comité de lecture : **
CARME 2015,
September 2015,

pp.12,
Naples,
Italie,

**Mots clés: ** Partial Least Square Regression (PLSR), Tucker Inter-Battery Analysis, Partial Least Square Correspondence Analysis (PLSCA), Co-Inertia Analysis, Partial Least Square Correlation (PLSC), Correspondence Analysis, Multiple Correspondence Analysis, Band of Burt

**Résumé: **
We present a generalization of the partial least square regression (PLSR) approach—called Partial Least Squares Regression Correspondence Analysis (PLSRCA)—tailored to the analysis of categorical (and heterogeneous categorical and “bipolar”) data. Just like standard PLSR, PLSRCA first computes a pair of latent variables—which are linear combinations of the original variables—that have maximal covariance. The coefficients of these latent variables are obtained from the (generalized) singular value decomposition (equivalent to correspondence analysis of the matrix Y’X) of the matrix obtained by the product of the (properly centered and normalized) data matrices (this matrix has been called a “Band of Burt” by Lebart et al., 2006). The latent variables are obtained by projecting the original data matrices and as supplementary rows and columns in the analysis of the “Band of Burt” data table. This part—called PLS-CA, generalizes Tucker inter-battery analysis to categorical and mixed data—instantiates the correspondence analysis component of PLSRCA. The latent variable from the first matrix X is then used (after an adequate normalization) to predict the second matrix Y. The effect of the first latent variable is then partialled out (i.e., “deflated”) from both matrices. This part instantiates the partial least squares regression component of PLSRCA. The process of 1) extracting latent variables, 2) predicting both matrices from the latent variable, and 3) deflation, is carried out until a specific number of latent variables has been extracted or when the first matrix is completely decomposed.
We illustrate PLSRCA with genetic data and show how single nucleotide polymorphisms (SNPs) can be used to predict a set of variables measuring cognitive impairment in Alzheimer’s Disease.

**Commentaires: **
7th conference on Correspondence Analysis and Related Methods, 20-23 septembre