首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sequential regression multiple imputation has emerged as a popular approach for handling incomplete data with complex features. In this approach, imputations for each missing variable are produced based on a regression model using other variables as predictors in a cyclic manner. Normality assumption is frequently imposed for the error distributions in the conditional regression models for continuous variables, despite that it rarely holds in real scenarios. We use a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed. The methods evaluated include the sequential normal imputation and its several extensions which adjust for non normal error terms. The results show that all methods perform well for estimating the marginal mean and proportion, as well as the regression coefficient when the error distribution is flat or moderately heavy tailed. When the error distribution is strongly heavy tailed, all methods retain their good performances for the mean and the adjusted methods have robust performances for the proportion; but all methods can have poor performances for the regression coefficient because they cannot accommodate the extreme values well. We caution against the mechanical use of sequential regression imputation without model checking and diagnostics.  相似文献   

2.
Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data.  相似文献   

3.
Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

4.
We have compared the efficacy of five imputation algorithms readily available in SAS for the quadratic discriminant function. Here, we have generated several different parametric-configuration training data with missing data, including monotone missing-at-random observations, and used a Monte Carlo simulation to examine the expected probabilities of misclassification for the two-class quadratic statistical discrimination problem under five different imputation methods. Specifically, we have compared the efficacy of the complete observation-only method and the mean substitution, regression, predictive mean matching, propensity score, and Markov Chain Monte Carlo (MCMC) imputation methods. We found that the MCMC and propensity score multiple imputation approaches are, in general, superior to the other imputation methods for the configurations and training-sample sizes we considered.  相似文献   

5.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

6.
Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.  相似文献   

7.
Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.  相似文献   

8.
Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey.  相似文献   

9.
Although the effect of missing data on regression estimates has received considerable attention, their effect on predictive performance has been neglected. We studied the performance of three missing data strategies—omission of records with missing values, replacement with a mean and imputation based on regression—on the predictive performance of logistic regression (LR), classification tree (CT) and neural network (NN) models in the presence of data missing completely at random (MCAR). Models were constructed using datasets of size 500 simulated from a joint distribution of binary and continuous predictors including nonlinearities, collinearity and interactions between variables. Though omission produced models that fit better on the data from which the models were developed, imputation was superior on average to omission for all models when evaluating the receiver operating characteristic (ROC) curve area, mean squared error (MSE), pooled variance across outcome categories and calibration X 2 on an independently generated test set. However, in about one-third of simulations, omission performed better. Performance was also more variable with omission including quite a few instances of extremely poor performance. Replacement and imputation generally produced similar results except with neural networks for which replacement, the strategy typically used in neural network algorithms, was inferior to imputation. Missing data affected simpler models much less than they did more complex models such as generalized additive models that focus on local structure For moderate sized datasets, logistic regressions that use simple nonlinear structures such as quadratic terms and piecewise linear splines appear to be at least as robust to randomly missing values as neural networks and classification trees.  相似文献   

10.
The multiple imputation technique has proven to be a useful tool in missing data analysis. We propose a Markov chain Monte Carlo method to conduct multiple imputation for incomplete correlated ordinal data using the multivariate probit model. We conduct a thorough simulation study to compare the performance of our proposed method with two available imputation methods – multivariate normal-based and chain equation methods for various missing data scenarios. For illustration, we present an application using the data from the smoking cessation treatment study for low-income community corrections smokers.  相似文献   

11.
研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的.  相似文献   

12.
Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study.  相似文献   

13.
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms.  相似文献   

14.
This paper examines a number of methods of handling missing outcomes in regressive logistic regression modelling of familial binary data, and compares them with an EM algorithm approach via a simulation study. The results indicate that a strategy based on imputation of missing values leads to biased estimates, and that a strategy of excluding incomplete families has a substantial effect on the variability of the parameter estimates. Recommendations are made which depend, amongst other factors, on the amount of missing data and on the availability of software.  相似文献   

15.
Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here.  相似文献   

16.
In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.  相似文献   

17.
By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data.  相似文献   

18.
This paper compares the performance of weighted generalized estimating equations (WGEEs), multiple imputation based on generalized estimating equations (MI-GEEs) and generalized linear mixed models (GLMMs) for analyzing incomplete longitudinal binary data when the underlying study is subject to dropout. The paper aims to explore the performance of the above methods in terms of handling dropouts that are missing at random (MAR). The methods are compared on simulated data. The longitudinal binary data are generated from a logistic regression model, under different sample sizes. The incomplete data are created for three different dropout rates. The methods are evaluated in terms of bias, precision and mean square error in case where data are subject to MAR dropout. In conclusion, across the simulations performed, the MI-GEE method performed better in both small and large sample sizes. Evidently, this should not be seen as formal and definitive proof, but adds to the body of knowledge about the methods’ relative performance. In addition, the methods are compared using data from a randomized clinical trial.  相似文献   

19.
The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

20.
This paper considers the estimation of coefficients in a linear regression model with missing observations in the independent variables and introduces a modification of the standard first order regression method for imputation of missing values. The modification provides stochastic values for imputation and, as an extension, makes use of the principle of weighted mixed regression. The proposed procedures are compared with two popular procedures—one which utilizes only the complete observations and the other which employs the standard first order regression imputation method for missing values. A simulation experiment to evaluate the gain in efficiency and to examine interesting issues like the impact of varying degree of multicollinearity in explanatory variables is proceeded. Some work on the case of discrete regressor variables is in progress and will be reported in a future article to follow.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号