[LSB06] Dealing with missing data in a k-means method - A simulation based approach

Conférence Internationale avec comité de lecture : COMPSTAT 2006, August 2006, pp.182, Italy,

Mots clés: k-means, Missing data, Multidimensional Scaling, Imputation methods

Résumé: In this work we propose to evaluate the effect of missing data on a k-means method used for variables partitioning. The partition method is the following: we start bya finding a dissimilarity matrix between variables; a multidimensional scaling ([BG05]) provides components and we use this components as input in a k-means method. Data are generated with aim of obtaining different types of patititions from twenty-five variavles (the data have a multinormal distribution). Then we simulate the missing data as in [Sil05], in different percentages. We determine the new partitions in presence of missing data using three methods: listwise method, simple imputations methods and multiple imputation method. We compare the partitions obtained in the three situations with those obtained with the original complete data, using a Rand index as in [YS04] and an affinity coefficient. We conclude on the effect of the missing data and imputation methods in this partition method under the established conditions.

Equipe: msdma


@inproceedings {
title="{Dealing with missing data in a k-means method - A simulation based approach}",
author=" A. Lorga da Silva and G. Saporta and H. Bacelar-Nicolau ",
booktitle="{COMPSTAT 2006}",
address=" Italy",