首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Caren Hasler  Yves Tillé 《Statistics》2016,50(6):1310-1331
Random imputation is an interesting class of imputation methods to handle item nonresponse because it tends to preserve the distribution of the imputed variable. However, such methods amplify the total variance of the estimators because values are imputed at random. This increase in variance is called imputation variance. In this paper, we propose a new random hot-deck imputation method that is based on the k-nearest neighbour methodology. It replaces the missing value of a unit with the observed value of a similar unit. Calibration and balanced sampling are applied to minimize the imputation variance. Moreover, our proposed method provides triple protection against nonresponse bias. This means that if at least one out of three specified models holds, then the resulting total estimator is unbiased. Finally, our approach allows the user to perform consistency edits and to impute simultaneously.  相似文献   

2.
In recent years an increase in nonresponse rates in major government and social surveys has been observed. It is thought that decreasing response rates and changes in nonresponse bias may affect, potentially severely, the quality of survey data. This paper discusses the problem of unit and item nonresponse in government surveys from an applied perspective and highlights some newer developments in this field with a focus on official statistics in the United Kingdom (UK). The main focus of the paper is on post-survey adjustment methods, in particular adjustment for item nonresponse. The use of various imputation and weighting methods is discussed in an example. The application also illustrates the close relationship between missing data and measurement error. JEL classification C42, C81  相似文献   

3.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

4.
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。  相似文献   

5.
Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

6.
Composite scores are useful in providing insights and trends about complex and multidimensional quality of care processes. However, missing data in subcomponents may hinder the overall reliability of a composite measure. In this study, strategies for handling missing data in Paediatric Admission Quality of Care (PAQC) score, an ordinal composite outcome, were explored through a simulation study. Specifically, the implications of the conventional method employed in addressing missing PAQC score subcomponents, consisting of scoring missing PAQC score components with a zero, and a multiple imputation (MI)-based strategy, were assessed. The latent normal joint modelling MI approach was used for the latter. Across simulation scenarios, MI of missing PAQC score elements at item level produced minimally biased estimates compared to the conventional method. Moreover, regression coefficients were more prone to bias compared to standards errors. Magnitude of bias was dependent on the proportion of missingness and the missing data generating mechanism. Therefore, incomplete composite outcome subcomponents should be handled carefully to alleviate potential for biased estimates and misleading inferences. Further research on other strategies of imputing at the component and composite outcome level and imputing compatibly with the substantive model in this setting, is needed.KEYWORDS: Composite outcome, multiple imputation, paediatrics, PAQC score, pneumonia  相似文献   

7.
Multiple imputation is widely accepted as the method of choice to address item nonresponse in surveys. Nowadays most statistical software packages include features to multiply impute missing values in a dataset. Nevertheless, the application to real data imposes many implementation problems. To define useful imputation models for a dataset that consists of categorical and possibly skewed continuous variables, contains skip patterns and all sorts of logical constraints is a challenging task. Besides, in most applications little attention is paid to the evaluation of the underlying assumptions behind the imputation models.  相似文献   

8.
In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
Abstract

Imputation methods for missing data on a time-dependent variable within time-dependent Cox models are investigated in a simulation study. Quality of life (QoL) assessments were removed from the complete simulated datasets, which have a positive relationship between QoL and disease-free survival (DFS) and delayed chemotherapy and DFS, by missing at random and missing not at random (MNAR) mechanisms. Standard imputation methods were applied before analysis. Method performance was influenced by missing data mechanism, with one exception for simple imputation. The greatest bias occurred under MNAR and large effect sizes. It is important to carefully investigate the missing data mechanism.  相似文献   

10.
In multiple imputation (MI), the resulting estimates are consistent if the imputation model is correct. To specify the imputation model, it is recommended to combine two sets of variables: those that are related to the incomplete variable and those that are related to the missingness mechanism. Several possibilities exist, but it is not clear how they perform in practice. The method that simply groups all variables together into the imputation model and four other methods that are based on the propensity scores are presented. Two of them are new and have not been used in the context of MI. The performance of the methods is investigated by a simulation study under different missing at random mechanisms for different types of variables. We conclude that all methods, except for one method based on the propensity scores, perform well. It turns out that as long as the relevant variables are taken into the imputation model, the form of the imputation model has only a minor effect in the quality of the imputations.  相似文献   

11.
The objective of this research was to demonstrate a framework for drawing inference from sensitivity analyses of incomplete longitudinal clinical trial data via a re‐analysis of data from a confirmatory clinical trial in depression. A likelihood‐based approach that assumed missing at random (MAR) was the primary analysis. Robustness to departure from MAR was assessed by comparing the primary result to those from a series of analyses that employed varying missing not at random (MNAR) assumptions (selection models, pattern mixture models and shared parameter models) and to MAR methods that used inclusive models. The key sensitivity analysis used multiple imputation assuming that after dropout the trajectory of drug‐treated patients was that of placebo treated patients with a similar outcome history (placebo multiple imputation). This result was used as the worst reasonable case to define the lower limit of plausible values for the treatment contrast. The endpoint contrast from the primary analysis was ? 2.79 (p = .013). In placebo multiple imputation, the result was ? 2.17. Results from the other sensitivity analyses ranged from ? 2.21 to ? 3.87 and were symmetrically distributed around the primary result. Hence, no clear evidence of bias from missing not at random data was found. In the worst reasonable case scenario, the treatment effect was 80% of the magnitude of the primary result. Therefore, it was concluded that a treatment effect existed. The structured sensitivity framework of using a worst reasonable case result based on a controlled imputation approach with transparent and debatable assumptions supplemented a series of plausible alternative models under varying assumptions was useful in this specific situation and holds promise as a generally useful framework. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

12.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

13.
The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies.  相似文献   

14.
This paper compares the performance of weighted generalized estimating equations (WGEEs), multiple imputation based on generalized estimating equations (MI-GEEs) and generalized linear mixed models (GLMMs) for analyzing incomplete longitudinal binary data when the underlying study is subject to dropout. The paper aims to explore the performance of the above methods in terms of handling dropouts that are missing at random (MAR). The methods are compared on simulated data. The longitudinal binary data are generated from a logistic regression model, under different sample sizes. The incomplete data are created for three different dropout rates. The methods are evaluated in terms of bias, precision and mean square error in case where data are subject to MAR dropout. In conclusion, across the simulations performed, the MI-GEE method performed better in both small and large sample sizes. Evidently, this should not be seen as formal and definitive proof, but adds to the body of knowledge about the methods’ relative performance. In addition, the methods are compared using data from a randomized clinical trial.  相似文献   

15.
The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

16.
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms.  相似文献   

17.
为了研究缺失偏态数据下的联合位置与尺度模型,基于分布自身的特点,提出了一种适合缺失偏态数据下联合建模的插补方法———修正随机回归插补方法,该方法对缺失数据下模型偏度参数的调整十分显著。通过随机模拟和实例研究,并与回归插补和随机回归插补方法进行比较,结果表明,所提出的修正随机回归插补方法是有用和有效的。  相似文献   

18.
Sequential regression multiple imputation has emerged as a popular approach for handling incomplete data with complex features. In this approach, imputations for each missing variable are produced based on a regression model using other variables as predictors in a cyclic manner. Normality assumption is frequently imposed for the error distributions in the conditional regression models for continuous variables, despite that it rarely holds in real scenarios. We use a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed. The methods evaluated include the sequential normal imputation and its several extensions which adjust for non normal error terms. The results show that all methods perform well for estimating the marginal mean and proportion, as well as the regression coefficient when the error distribution is flat or moderately heavy tailed. When the error distribution is strongly heavy tailed, all methods retain their good performances for the mean and the adjusted methods have robust performances for the proportion; but all methods can have poor performances for the regression coefficient because they cannot accommodate the extreme values well. We caution against the mechanical use of sequential regression imputation without model checking and diagnostics.  相似文献   

19.
Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol.  相似文献   

20.
Summary.  Social surveys are usually affected by item and unit non-response. Since it is unlikely that a sample of respondents is a random sample, social scientists should take the missing data problem into account in their empirical analyses. Typically, survey methodologists try to simplify the work of data users by 'completing' the data, filling the missing variables through imputation. The aim of the paper is to give data users some guidelines on how to assess the effects of imputation on their microlevel analyses. We focus attention on the potential bias that is caused by imputation in the analysis of income variables, using the European Community Household Panel as an illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号