| ||||||||||||||||||||||||||||||||||||
[LSB06] Dealing with missing data in a k-means method - A simulation based approachConférence Internationale avec comité de lecture : COMPSTAT 2006, August 2006, pp.182, Italy,Mots clés: k-means, Missing data, Multidimensional Scaling, Imputation methods
Résumé:
In this work we propose to evaluate the effect of missing data on a k-means method used for variables partitioning. The partition method is the following: we start bya finding a dissimilarity matrix between variables; a multidimensional scaling ([BG05]) provides components and we use this components as input in a k-means method.
Data are generated with aim of obtaining different types of patititions from twenty-five variavles (the data have a multinormal distribution). Then we simulate the missing data as in [Sil05], in different percentages.
We determine the new partitions in presence of missing data using three methods: listwise method, simple imputations methods and multiple imputation method.
We compare the partitions obtained in the three situations with those obtained with the original complete data, using a Rand index as in [YS04] and an affinity coefficient.
We conclude on the effect of the missing data and imputation methods in this partition method under the established conditions.
Equipe:
msdma
BibTeX
|
||||||||||||||||||||||||||||||||||||