[GS17] Variable selection for multiply-imputed data with penalized generalized estimating equations

Revue Internationale avec comité de lecture : Journal Computational Statistics & Data Analysis, vol. 110, pp. 103-114, 2017, (doi:

Mots clés: Generalized estimating equations; LASSO; Longitudinal data; Missing data; Multiple imputation; Variable selection

Résumé: Generalized estimating equations (GEE) are useful tools for marginal regression analysis for longitudinal data. Having a high number of variables along with the presence of missing data presents complex issues when working in a longitudinal context. In variable selection for instance, penalized generalized estimating equations have not been systematically developed to integrate missing data. The MI-PGEE: multiple imputation-penalized generalized estimating equations, an extension of the multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) is presented. MI-PGEE allows integration of missing data and within-subject correlation in variable selection procedures. Missing data are dealt with using multiple imputation, and variable selection is performed using a group LASSO penalty. Estimated coefficients for the same variable across multiply-imputed datasets are considered as a group while applying penalized generalized estimating equations, leading to a unique model across multiply-imputed datasets. In order to select the tuning parameter, a new BIC-like criterion is proposed. In a simulation study, the advantage of using MI-PGEE compared to simple imputation PGEE is shown. The usefulness of the new method is illustrated by an application to a subgroup of the placebo arm of the strontium ranelate efficacy in knee osteoarthritis trial study.


@article {
title="{Variable selection for multiply-imputed data with penalized generalized estimating equations}",
author="J. Geronimi and G. Saporta",
journal="Computational Statistics & Data Analysis",