| ||||||||||||||||||||||||||||||||
[Sap13] A survey of some sparse methods for high-dimensional dataConférences invitées : SADA'13, March 2013,Mots clés: lasso, sparse regression, sparse PCA, sparse MCA
Résumé:
High dimensional data means that the number of variables p if far larger than the number of observations n. This occurs in several fields such as genomic data or chemometrics. This didactic talk starts from a survey of various solutions in linear regression and present afterwards their extensions to unsupervised « sparse » methods for principal components analysis (PCA) and multiple correspondence analysis (MCA).
When p>n the OLS estimator does not exist for linear regression. Since it is a case of forced multicollinearity, one may use regularized techniques such as ridge regression, principal component regression or PLS regression: these methods provide rather robust estimates through a dimension reduction approach or with explicit (or not) constraints on the regression coefficients. The fact that all the predictors are kept is often considered as a positive point.
However if p>>n it becomes a drawback since a combination of all variables cannot be interpreted. Sparse combinations, ie with a large number of zero coefficients are preferred. Lasso, elastic net, sparse PLS perform simultaneously regularization and variable selection thanks to non quadratic penalties: L1, SCAD etc. We will present variants such as the group-lasso when the variables are structured in blocks.
In PCA, the singular value decomposition shows that if we regress principal components onto the input variables, the vector of regression coefficients is equal to the factor loadings. It suffices to adapt sparse regression techniques to get sparse versions of PCA and of PCA with groups of variables. We conclude by a presentation of a sparse version of Multiple Correspondence Analysis.
Commentaires:
Conférence Internationale "Statistique Appliquée au Développement Africain". Cotonou, 5-8 mars 2013
Equipe:
msdma
BibTeX
|
||||||||||||||||||||||||||||||||