[LSB06] Dealing with missing data in a k-means method - A simulation based approach

Conférence Internationale avec comité de lecture : COMPSTAT 2006, August 2006, pp.182, Italy,

Auteurs: A. Lorga Da Silva , G. Saporta , H. Bacelar-Nicolau

Mots clés: k-means, Missing data, Multidimensional Scaling, Imputation methods

Résumé: In this work we propose to evaluate the effect of missing data on a k-means method used for variables partitioning. The partition method is the following: we start bya finding a dissimilarity matrix between variables; a multidimensional scaling ([BG05]) provides components and we use this components as input in a k-means method. Data are generated with aim of obtaining different types of patititions from twenty-five variavles (the data have a multinormal distribution). Then we simulate the missing data as in [Sil05], in different percentages. We determine the new partitions in presence of missing data using three methods: listwise method, simple imputations methods and multiple imputation method. We compare the partitions obtained in the three situations with those obtained with the original complete data, using a Rand index as in [YS04] and an affinity coefficient. We conclude on the effect of the missing data and imputation methods in this partition method under the established conditions.

Equipe: msdma

BibTeX

@inproceedings {
	LSB06,
	title	=	"{Dealing with missing data in a k-means method - A simulation based approach}",
	author	=	" A. Lorga da Silva and G. Saporta and H. Bacelar-Nicolau ",
	booktitle	=	"{COMPSTAT 2006}",
	year	=	2006,
	month	=	"August",
	pages	=	"182",
	address	=	" Italy",
}