首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Practical Advice on How to Impute Continuous Data When the Ultimate Interest Centers on Dichotomized Outcomes Through Pre-Specified Thresholds
Authors:Hakan Demirtas
Institution:1. Division of Epidemiology and Biostatistics , University of Illinois at Chicago , Chicago, Illinois, USA demirtas@uic.edu
Abstract:Multiple imputation under the multivariate normality assumption has often been regarded as a viable model-based approach in dealing with incomplete continuous data in the last two decades. A situation where the measurements are taken on a continuous scale with an ultimate interest in dichotomized versions through discipline-specific thresholds is not uncommon in applied research, especially in medical and social sciences. In practice, researchers generally tend to impute missing values for continuous outcomes under a Gaussian imputation model, and then dichotomize them via commonly-accepted cut-off points. An alternative strategy is creating multiply imputed data sets after dichotomization under a log-linear imputation model that uses a saturated multinomial structure. In this work, the performances of the two imputation methods were examined on a fairly wide range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality. Behavior of efficiency and accuracy measures was explored to determine the extent to which the procedures work properly. The conclusion drawn is that dichotomization before carrying out a log-linear imputation should be the preferred approach except for a few special cases. I recommend that researchers use the atypical second strategy whenever the interest centers on binary quantities that are obtained through underlying continuous measurements. A possible explanation is that erratic/idiosyncratic aspects that are not accommodated by a Gaussian model are probably transformed into better-behaving discrete trends in this particular missing-data setting. This premise outweighs the assertion that continuous variables inherently carry more information, leading to a counter-intuitive, but potentially useful result for practitioners.
Keywords:Log-linear models  Multimodality  Multiple imputation  Multivariate normality  Skewness
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号