Rechercher

[SLN19] Sparse Methods for Unsupervised Data Analysis

Conférences invitées : SIDM 2019, June 2019, pp.10, Beijing, China,

Mots clés: Sparse methods; Correspondence analysis; Sparse components; Penalized matrix decomposition

Résumé: Principal Components Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) are among the most efficient techniques for visualizing and exploring numerical and categorical data in an unsupervised way. However, in the case of high-dimensional data, the interpretation of linear combinations of hundreds or thousands of variables becomes very difficult. The objective of sparse methods is to obtain pseudo-components which are linear combinations of only a small number of variables, and thus to facilitate interpretation by highlighting only the most important features. This simplification is achieved at the cost of the loss of characteristic properties like the orthogonality of the components and of the loadings. This explains why there are more than 20 variants of sparse PCA. In contrast, sparsifying correspondence analysis has received little or no attention in the literature, except for MCA. After a brief survey of sparse PCA, we will focus in sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices. We use the fact that CA is both a PCA (or a weighted SVD) and a canonical analysis, in order to develop column sparse (or row sparse) CA and a doubly sparse CA for rows and columns.

Commentaires: The 4th International Symposium on Interval Data Modelling: Theory and Applications

BibTeX

@inproceedings {
SLN19,
title="{Sparse Methods for Unsupervised Data Analysis}",
author=" G. Saporta and R. Liu and N. Niang Keita and H. Wang ",
booktitle="{SIDM 2019}",
year=2019,
month="June",
pages="10",
address="Beijing, China",
note="{The 4th International Symposium on Interval Data Modelling: Theory and Applications}",
}