首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multiple imputation (MI) is an increasingly popular method for analysing incomplete multivariate data sets. One of the most crucial assumptions of this method relates to mechanism leading to missing data. Distinctness is typically assumed, which indicates a complete independence of mechanisms underlying missingness and data generation. In addition, missing at random or missing completely at random is assumed, which explicitly states under which conditions missingness is independent of observed data. Despite common use of MI under these assumptions, plausibility and sensitivity to these fundamental assumptions have not been well-investigated. In this work, we investigate the impact of non-distinctness and non-ignorability. In particular, non-ignorability is due to unobservable cluster-specific effects (e.g. random-effects). Through a comprehensive simulation study, we show that MI inferences suggest that nonignoriability due to non-distinctness do not immediately imply dismal performance while non-ignorability due to missing not at random leads to quite subpar performance.  相似文献   

2.
In multiple imputation (MI), the resulting estimates are consistent if the imputation model is correct. To specify the imputation model, it is recommended to combine two sets of variables: those that are related to the incomplete variable and those that are related to the missingness mechanism. Several possibilities exist, but it is not clear how they perform in practice. The method that simply groups all variables together into the imputation model and four other methods that are based on the propensity scores are presented. Two of them are new and have not been used in the context of MI. The performance of the methods is investigated by a simulation study under different missing at random mechanisms for different types of variables. We conclude that all methods, except for one method based on the propensity scores, perform well. It turns out that as long as the relevant variables are taken into the imputation model, the form of the imputation model has only a minor effect in the quality of the imputations.  相似文献   

3.
4.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

5.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

6.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

7.
Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study.  相似文献   

8.
A popular choice when analyzing ordinal data is to consider the cumulative proportional odds model to relate the marginal probabilities of the ordinal outcome to a set of covariates. However, application of this model relies on the condition of identical cumulative odds ratios across the cut-offs of the ordinal outcome; the well-known proportional odds assumption. This paper focuses on the assessment of this assumption while accounting for repeated and missing data. In this respect, we develop a statistical method built on multiple imputation (MI) based on generalized estimating equations that allows to test the proportionality assumption under the missing at random setting. The performance of the proposed method is evaluated for two MI algorithms for incomplete longitudinal ordinal data. The impact of both MI methods is compared with respect to the type I error rate and the power for situations covering various numbers of categories of the ordinal outcome, sample sizes, rates of missingness, well-balanced and skewed data. The comparison of both MI methods with the complete-case analysis is also provided. We illustrate the use of the proposed methods on a quality of life data from a cancer clinical trial.  相似文献   

9.
This study considers a fully-parametric but uncongenial multiple imputation (MI) inference to jointly analyze incomplete binary response variables observed in a correlated data settings. Multiple imputation model is specified as a fully-parametric model based on a multivariate extension of mixed-effects models. Dichotomized imputed datasets are then analyzed using joint GEE models where covariates are associated with the marginal mean of responses with response-specific regression coefficients and a Kronecker product is accommodated for cluster-specific correlation structure for a given response variable and correlation structure between multiple response variables. The validity of the proposed MI-based JGEE (MI-JGEE) approach is assessed through a Monte Carlo simulation study under different scenarios. The simulation results, which are evaluated in terms of bias, mean-squared error, and coverage rate, show that MI-JGEE has promising inferential properties even when the underlying multiple imputation is misspecified. Finally, Adolescent Alcohol Prevention Trial data are used for illustration.  相似文献   

10.
There has been increasing use of quality-of-life (QoL) instruments in drug development. Missing item values often occur in QoL data. A common approach to solve this problem is to impute the missing values before scoring. Several imputation procedures, such as imputing with the most correlated item and imputing with a row/column model or an item response model, have been proposed. We examine these procedures using data from two clinical trials, in which the original asthma quality-of-life questionnaire (AQLQ) and the miniAQLQ were used. We propose two modifications to existing procedures: truncating the imputed values to eliminate outliers and using the proportional odds model as the item response model for imputation. We also propose a novel imputation method based on a semi-parametric beta regression so that the imputed value is always in the correct range and illustrate how this approach can easily be implemented in commonly used statistical software. To compare these approaches, we deleted 5% of item values in the data according to three different missingness mechanisms, imputed them using these approaches and compared the imputed values with the true values. Our comparison showed that the row/column-model-based imputation with truncation generally performed better, whereas our new approach had better performance under a number scenarios.  相似文献   

11.
Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data.  相似文献   

12.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

13.
We study bias arising from rounding categorical variables following multivariate normal (MVN) imputation. This task has been well studied for binary variables, but not for more general categorical variables. Three methods that assign imputed values to categories based on fixed reference points are compared using 25 specific scenarios covering variables with k=3, …, 7 categories, and five distributional shapes, and for each k=3, …, 7, we examine the distribution of bias arising over 100,000 distributions drawn from a symmetric Dirichlet distribution. We observed, on both empirical and theoretical grounds, that one method (projected-distance-based rounding) is superior to the other two methods, and that the risk of invalid inference with the best method may be too high at sample sizes n≥150 at 50% missingness, n≥250 at 30% missingness and n≥1500 at 10% missingness. Therefore, these methods are generally unsatisfactory for rounding categorical variables (with up to seven categories) following MVN imputation.  相似文献   

14.
This paper presents missing data methods for repeated measures data in small samples. Most methods currently available are for large samples. In particular, no studies have compared the performance of multiple imputation methods to that of non-imputation incomplete analysis methods. We first develop a strategy for multiple imputations for repeated measures data under a cell-means model that is applicable for any multivariate data with small samples. Multiple imputation inference procedures are applied to the resulting multiply imputed complete data sets. Comparisons to other available non-imputation incomplete data methods is made via simulation studies to conclude that there is not much gain in using the computer intensive multiple imputation methods for small sample repeated measures data analysis in terms of the power of testing hypotheses of parameters of interest.  相似文献   

15.
Non-response (or missing data) is often encountered in large-scale surveys. To enable the behavioural analysis of these data sets, statistical treatments are commonly applied to complete or remove these data. However, the correctness of such procedures critically depends on the nature of the underlying missingness generation process. Clearly, the efficacy of applying either case deletion or imputation procedures rests on the unknown missingness generation mechanism. The contribution of this paper is twofold. The study is the first to propose a simple sequential method to attempt to identify the form of missingness. Second, the effectiveness of the tests is assessed by generating (experimentally) nine missing data sets by imposed MCAR, MAR and NMAR processes, with data removed.  相似文献   

16.
We performed a simulation study comparing the statistical properties of the estimated log odds ratio from propensity scores analyses of a binary response variable, in which missing baseline data had been imputed using a simple imputation scheme (Treatment Mean Imputation), compared with three ways of performing multiple imputation (MI) and with a Complete Case analysis. MI that included treatment (treated/untreated) and outcome (for our analyses, outcome was adverse event [yes/no]) in the imputer's model had the best statistical properties of the imputation schemes we studied. MI is feasible to use in situations where one has just a few outcomes to analyze. We also found that Treatment Mean Imputation performed quite well and is a reasonable alternative to MI in situations where it is not feasible to use MI. Treatment Mean Imputation performed better than MI methods that did not include both the treatment and outcome in the imputer's model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
It is quite a challenge to develop model‐free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh‐dimensional covariates with full data can be applied to missing response case. The first method is the so‐called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram‐based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property.  相似文献   

18.
Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra‐household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness.  相似文献   

19.
Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

20.
Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted log-likelihood function by using an EM algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号