[SAP03] Correspondence Analysis and Discrimination

Conférence Internationale avec comité de lecture : CARME2003, January 2003,

Auteurs: G. Saporta

Résumé: The use of correspondence analysis for discrimination purposes goes back to the “prehistory” of data analysis (Fisher, 1940) where one looks for the optimal scaling of categories of a variable X in order to predict a categorical variable Y. When there are several categorical predictors a commonly used technique consists in a two step analysis: multiple correspondence is first performed on the predictors set, followed by a discriminant analysis using factor coordinates of units as numerical predictors (Bouroche and al.,1977). However in banking applications (credit scoring) logistic regression seems to be more and more used instead of discriminant analysis when predictors are categorical. One of the reasons advocated in favour of logistic regression, is that it gives a probabilistic model and it is often claimed among econometricians that the theoretical basis is more solid, but this is arguable. No doubt also that this tendency is due to the the flexibility of logistic regression software which have been more developped compared to discriminant analysis. However it could be easily proved that discarding non informative eigenvectors gives more robust results than direct logistic regression, since it is a regularisation technique similar to Principal Component Regression (Hastie and al. 2001). Moreover correspondence analysis provides an insight on the data, which is always useful. Since factor coordinates are derived without taking into account the response variable, one could think of adapting PLS regression. We will show that PLS is related, at least for the first PLS component, to barycentric discrimination (Celeux, Nakache 1994 and Verde, Palumbo 1996). For two class discrimination, we will also present a combination of logistic regression and correspondence analysis, as well as ridge regression which are interesting alternatives. A comparison of all these methods will be illustrated on a real case study.

Commentaires: Correspondence Analysis and Related Methods, Barcelone, 29 juin - 2 juillet 2003


@inproceedings {
title="{Correspondence Analysis and Discrimination}",
author=" G. Saporta ",
note="{Correspondence Analysis and Related Methods, Barcelone, 29 juin - 2 juillet 2003}",