首页 | 本学科首页   官方微博 | 高级检索  
     


MIMCA: multiple imputation for categorical variables with multiple correspondence analysis
Authors:Vincent Audigier  François Husson  Julie Josse
Affiliation:1.Applied Mathematics Department,Agrocampus Ouest,Rennes Cedex,France
Abstract:We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.
Keywords:
本文献已被 SpringerLink 等数据库收录!
相似文献(共20条):
[1]、Latent class based multiple imputation approach for missing categorical data[J].Journal of statistical planning and inference
[2]、Hatef Fotuhi,Ali Reza Taheriyoun.A novel approach based on multiple correspondence analysis for monitoring social networks with categorical attributed data[J].Journal of Statistical Computation and Simulation,2019,89(16):3137-3164.
[3]、Rosaria, Lombardo,Eric, J., Beh.Simple and multiple correspondence analysis for ordinal-scale variables using orthogonal polynomials[J].Journal of applied statistics,2010,37(12):2101-2116.
[4]、Rounding non-binary categorical variables following multivariate normal imputation: evaluation of simple methods and implications for practice[J].Journal of Statistical Computation and Simulation
[5]、Luigi, D\'Ambra,Onur, Köksoy.Cumulative correspondence analysis of ordered categorical data from industrial experiments[J].Journal of applied statistics,2009,36(12):1315-1328.
[6]、Multiple imputation for continuous variables using a Bayesian principal component analysis[J].Journal of Statistical Computation and Simulation
[7]、Seppo Laaksonen.Alternative imputation techniques for complex metric variables[J].Journal of applied statistics,2003,30(9):1009-1020.
[8]、Atanu Biswas,Eunsik Park.Measures of association for nominal categorical variables[J].Journal of the Korean Statistical Society,2009,38(3):247-258.
[9]、Elena, Abascal,Vidal, Díaz, de Rada,M., Isabel, Landaluce.Extending dual multiple factor analysis to categorical tables[J].Journal of applied statistics,2013,40(2):415-428.
[10]、Steen Magnussen.Stepwise estimators for three-phase sampling of categorical variables[J].Journal of applied statistics,2003,30(5):461-475.
[11]、Chang Sup Sung,Sung Jin Ahn.A proportional-reduction-in-impurity measure of association for categorical variables[J].统计学通讯:理论与方法,2013,42(8):2083-2110.
[12]、Wai-Yin Poon.Sources of heterogeneity in distributions with ordered categorical variables[J].Journal of applied statistics,1999,26(3):383-392.
[13]、Haitao Tian,Ching-Yu Cheng,Liang Zhang.Regression models with ordered multiple categorical predictors[J].Journal of Statistical Computation and Simulation,2018,88(16):3164-3178.
[14]、Rosa Arboretti Giancristofaro,Stefano Bonnini,Fortunato Pesarin.A permutation approach for testing heterogeneity in two-sample categorical variables[J].Statistics and Computing,2009,19(2):209-216.
[15]、Heiko Groenitz.Using prior information in privacy-protecting survey designs for categorical sensitive variables[J].Statistical Papers,2015,56(1):167-189.
[16]、Luigi, D\'Ambra,Antonello, D\'Ambra.Visualizing main effects and interaction in multiple non-symmetric correspondence analysis[J].Journal of applied statistics,2012,39(10):2165-2175.
[17]、Papageorgiou,Ioulia,Moustaki,Irini.Sampling of pairs in pairwise likelihood estimation for latent variable models with categorical observed variables[J].Statistics and Computing,2019,29(2):351-365.
[18]、Alan Agresti,David B. Hitchcock.Bayesian inference for categorical data analysis[J].Statistical Methods and Applications,2005,14(3):297-330.
[19]、James R. Carpenter,Michael G. Kenward, Stijn Vansteelandt.A comparison of multiple imputation and doubly robust estimation for analyses with missing data[J].Journal of the Royal Statistical Society. Series A, (Statistics in Society),2006,169(3):571-584.
[20]、Marco Di Zio,Ugo Guarnera.A multiple imputation approach to deal with the unity measure error[J].Statistical Methods and Applications,2010,19(3):431-444.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号