Rechercher

[BGS12] Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis

Conférence Internationale avec comité de lecture : Compstat 2012, August 2012, pp.99-106, Limassol, Chypre,

Mots clés: Sparse principal component analysis, group Lasso, group variable selection, dimension reduction

Résumé: Two new methods to select groups of variables have been developed for multiblock data: "Group Sparse Principal Component Analysis" (GSPCA) for continuous variables and "Sparse Multiple Correspondence Analysis" (SMCA) for categorical variables. GSPCA is a compromise between Sparse PCA method of Zou, Hastie and Tibshirani and the method "group Lasso" of Yuan and Lin. PCA is formulated as a regression-type optimization problem and uses the constraints of the group Lasso on regression coecients to produce modi ed principal components with sparse loadings. It leads to reduce the number of nonzero coecients, i.e. the number of selected groups. SMCA is a straightforward extension of GSPCA to groups of indicator variables, with the chi-square metric. Two real examples will be used to illustrate each method. The fi rst one is a data set on 25 trace elements measured in three tissues of 48 crabs (25 blocks of 3 variables). The second one is a data set of 502 women aimed at the identi cation of genes a ecting skin aging with more than 370.000 blocks, each block corresponding to SNPs (Single Nucleotide Polymorphisms) coded into 3 categories.

Equipe: msdma

BibTeX

@inproceedings {
BGS12,
title="{Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis}",
author=" A. Bernard and C. Guinot and G. Saporta ",
booktitle="{Compstat 2012}",
year=2012,
month="August",
pages="99-106",
address="Limassol, Chypre",
}