[SBG12] A generalisation of sparse PCA to multiple correspondence analysis

Conférences Internationales sans actes : ERCIM 2012, Oviedo, Spain,
Résumé: Principal components analysis (PCA) for numerical variables and multiple correspondence analysis (MCA) for categorical variables are wellknown dimension reduction techniques. PCA and MCA provide a small number of informative dimensions: the components. However, these components are a combination of all original variables, hence some difficulties in the interpretation. Factor rotation (varimax, quartimax etc.) has a long history in factor analysis for obtaining simple structure, ie looking for combinations with a large number of coefficients either close to zero or to 1 or -1. Only recently, rotations have been used in Multiple Correspondence Analysis. Sparse PCA and group sparse PCA are new techniques providing components which are combinations of few original variables: rewriting PCA as a regression problem, null loadings are obtained by imposing the lasso (or similar) constraint on the regression coefficients. When the data matrix has a natural block structure, group sparse PCA give zero coefficients to entire blocks of variables. Since MCA is a special kind of PCA with blocks of indicator variables, we define sparse MCA as an extension of group sparse PCA. We present an application of sparse MCA to genetic data (640 SNP’s with 3 categories measured on 502 women)and a comparison between sparse and rotated components.

Equipe: msdma


@conference {
title="{A generalisation of sparse PCA to multiple correspondence analysis}",
author=" G. Saporta and A. Bernard and C. Guinot ",