期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Strategies for handling missing data in longitudinal studies with questionnaires

Nazanin Nooraee Geert Molenberghs Johan Ormel 《Journal of Statistical Computation and Simulation》2018,88(17):3415-3436

Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias. 相似文献

2.

Missing data techniques for multilevel data: implications of model misspecification

Anne C. Black Ofer Harel D. Betsy McCoach 《Journal of applied statistics》2011,38(9):1845-1865

When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD. 相似文献

3.

The impact of dichotomization in longitudinal data analysis: a simulation study

Bongin Yoo 《Pharmaceutical statistics》2010,9(4):298-312

In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

4.

A note on dealing with missing standard errors in meta-analyses of continuous outcome measures in WinBUGS

Stevens JW 《Pharmaceutical statistics》2011,10(4):374-378

A meta-analysis of a continuous outcome measure may involve missing standard errors. This is not a problem depending on assumptions made about the population standard deviation. Multiple imputation can be used to impute missing values while allowing for uncertainty in the imputation. Markov chain Monte Carlo simulation is a multiple imputation technique for generating posterior predictive distributions for missing data. We present an example of imputing missing variances using WinBUGS. The example highlights the importance of checking model assumptions, whether for missing or observed data. 相似文献

5.

Navigating choices when applying multiple imputation in the presence of multi-level categorical interaction effects

《Statistical Methodology》2015

Multiple imputation (MI) is an appealing option for handling missing data. When implementing MI, however, users need to make important decisions to obtain estimates with good statistical properties. One such decision involves the choice of imputation model–the joint modeling (JM) versus fully conditional specification (FCS) approach. Another involves the choice of method to handle interactions. These include imputing the interaction term as any other variable (active imputation), or imputing the main effects and then deriving the interaction (passive imputation). Our study investigates the best approach to perform MI in the presence of interaction effects involving two categorical variables. Such effects warrant special attention as they involve multiple correlated parameters that are handled differently under JM and FCS modeling. Through an extensive simulation study, we compared active, passive and an improved passive approach under FCS, as JM precludes passive imputation. We additionally compared JM and FCS techniques using active imputation. Performance between active and passive imputation was comparable. The improved passive approach proved superior to the other two particularly when the number of parameters corresponding to the interaction was large. JM without rounding and FCS using active imputation were also mostly comparable, with JM outperforming FCS when the number of parameters was large. In a direct comparison of JM active and FCS improved passive, the latter was the clear winner. We recommend improved passive imputation under FCS along with sensitivity analyses to handle multi-level interaction terms. 相似文献

6.

Bias of factor loadings from questionnaire data with imputed scores

《Journal of Statistical Computation and Simulation》2012,82(1):13-23

This study investigated the bias of factor loadings obtained from incomplete questionnaire data with imputed scores. Three models were used to generate discrete ordered rating scale data typical of questionnaires, also known as Likert data. These methods were the multidimensional polytomous latent trait model, a normal ogive item response theory model, and the discretized normal model. Incomplete data due to nonresponse were simulated using either missing completely at random or not missing at random mechanisms. Subsequently, for each incomplete data matrix, four imputation methods were applied for imputing item scores. Based on a completely crossed six-factor design, it was concluded that in general, bias was small for all data simulation methods and all imputation methods, and under all nonresponse mechanisms. Imputation method, two-way-plus-error, had the smallest bias in the factor loadings. Bias based on the discretized normal model was greater than that based on the other two models. 相似文献

7.

Multiple imputation of censored survival data in the presence of missing covariates using restricted mean survival time

Gurprit Grover 《Journal of applied statistics》2015,42(4):817-827

Missing covariates data with censored outcomes put a challenge in the analysis of clinical data especially in small sample settings. Multiple imputation (MI) techniques are popularly used to impute missing covariates and the data are then analyzed through methods that can handle censoring. However, techniques based on MI are available to impute censored data also but they are not much in practice. In the present study, we applied a method based on multiple imputation by chained equations to impute missing values of covariates and also to impute censored outcomes using restricted survival time in small sample settings. The complete data were then analyzed using linear regression models. Simulation studies and a real example of CHD data show that the present method produced better estimates and lower standard errors when applied on the data having missing covariate values and censored outcomes than the analysis of the data having censored outcome but excluding cases with missing covariates or the analysis when cases with missing covariate values and censored outcomes were excluded from the data (complete case analysis). 相似文献

8.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

9.

Comparison of several imputation methods for missing baseline data in propensity scores analysis of binary outcome

Brenda J. Crowe Ilya A. Lipkovich Ouhong Wang 《Pharmaceutical statistics》2010,9(4):269-279

We performed a simulation study comparing the statistical properties of the estimated log odds ratio from propensity scores analyses of a binary response variable, in which missing baseline data had been imputed using a simple imputation scheme (Treatment Mean Imputation), compared with three ways of performing multiple imputation (MI) and with a Complete Case analysis. MI that included treatment (treated/untreated) and outcome (for our analyses, outcome was adverse event [yes/no]) in the imputer's model had the best statistical properties of the imputation schemes we studied. MI is feasible to use in situations where one has just a few outcomes to analyze. We also found that Treatment Mean Imputation performed quite well and is a reasonable alternative to MI in situations where it is not feasible to use MI. Treatment Mean Imputation performed better than MI methods that did not include both the treatment and outcome in the imputer's model. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

10.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

11.

Nonparametric curve estimation with missing data: A general empirical process approach

Majid Mojirsheibani 《Journal of statistical planning and inference》2007

A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions. 相似文献

12.

Addressing the problem of missing data in decision tree modeling

Saiedeh Haji-Maghsoudi Azam Rastegari Behshid Garrusi 《Journal of applied statistics》2018,45(3):547-557

Tree-based models (TBMs) can substitute missing data using the surrogate approach (SUR). The aim of this study is to compare the performance of statistical imputation against the performance of SUR in TBMs. Employing empirical data, a TBM was constructed. Thereafter, 10%, 20%, and 40% of variable values appeared as the first split was deleted, and imputed with and without the use of outcome variables in the imputation model (IMP? and IMP+). This was repeated one thousand times. Absolute relative bias above 0.10 was defined as sever (SARB). Subsequently, in a series of simulations, the following parameters were changed: the degree of correlation among variables, the number of variables truly associated with the outcome, and the missing rate. At a 10% missing rate, the proportion of times SARB was observed in either SUR or IMP? was two times higher than in IMP+ (28% versus 13%). When the missing rate was increased to 20%, all these proportions were approximately doubled. Irrespective of the missing rate, IMP+ was about 65% less likely to produce SARB than SUR. Results of IMP? and SUR were comparable up to a 20% missing rate. At a high missing rate, IMP? was 76% more likely to provide SARB estimates. Statistical imputation of missing data and the use of outcome variable in the imputation model is recommended, even in the content of TBM. 相似文献

13.

Joint GEEs for multivariate correlated data with incomplete binary outcomes

G. Inan R. Yucel 《Journal of applied statistics》2017,44(11):1920-1937

This study considers a fully-parametric but uncongenial multiple imputation (MI) inference to jointly analyze incomplete binary response variables observed in a correlated data settings. Multiple imputation model is specified as a fully-parametric model based on a multivariate extension of mixed-effects models. Dichotomized imputed datasets are then analyzed using joint GEE models where covariates are associated with the marginal mean of responses with response-specific regression coefficients and a Kronecker product is accommodated for cluster-specific correlation structure for a given response variable and correlation structure between multiple response variables. The validity of the proposed MI-based JGEE (MI-JGEE) approach is assessed through a Monte Carlo simulation study under different scenarios. The simulation results, which are evaluated in terms of bias, mean-squared error, and coverage rate, show that MI-JGEE has promising inferential properties even when the underlying multiple imputation is misspecified. Finally, Adolescent Alcohol Prevention Trial data are used for illustration. 相似文献

14.

Multiple imputation for gamma outcome variable using generalized linear model

Vinay K. Gupta Gurprit Grover 《Journal of Statistical Computation and Simulation》2017,87(10):1980-1988

We used a proper multiple imputation (MI) through Gibbs sampling approach to impute missing values of a gamma distributed outcome variable which were missing at random, using generalized linear model (GLM) with identity link function. The missing values of the outcome variable were multiply imputed using GLM and then the complete data sets obtained after MI were analysed through GLM again for the estimation purpose. We examined the performance of the proposed technique through a simulation study with the data sets having four moderate and large proportions of missing values, 10%, 20%, 30% and 50%. We also applied this technique on a real life data and compared the results with those obtained by applying GLM only on observed cases. The results showed that the proposed technique gave better results for moderate proportions of missing values. 相似文献

15.

Testing the proportional odds assumption in multiply imputed ordinal longitudinal data

A.F. Donneau M. Mauer P. Lambert E. Lesaffre A. Albert 《Journal of applied statistics》2015,42(10):2257-2279

A popular choice when analyzing ordinal data is to consider the cumulative proportional odds model to relate the marginal probabilities of the ordinal outcome to a set of covariates. However, application of this model relies on the condition of identical cumulative odds ratios across the cut-offs of the ordinal outcome; the well-known proportional odds assumption. This paper focuses on the assessment of this assumption while accounting for repeated and missing data. In this respect, we develop a statistical method built on multiple imputation (MI) based on generalized estimating equations that allows to test the proportionality assumption under the missing at random setting. The performance of the proposed method is evaluated for two MI algorithms for incomplete longitudinal ordinal data. The impact of both MI methods is compared with respect to the type I error rate and the power for situations covering various numbers of categories of the ordinal outcome, sample sizes, rates of missingness, well-balanced and skewed data. The comparison of both MI methods with the complete-case analysis is also provided. We illustrate the use of the proposed methods on a quality of life data from a cancer clinical trial. 相似文献

16.

Imputation techniques for incomplete data in quadratic discriminant analysis

《Journal of Statistical Computation and Simulation》2012,82(6):863-877

We have compared the efficacy of five imputation algorithms readily available in SAS for the quadratic discriminant function. Here, we have generated several different parametric-configuration training data with missing data, including monotone missing-at-random observations, and used a Monte Carlo simulation to examine the expected probabilities of misclassification for the two-class quadratic statistical discrimination problem under five different imputation methods. Specifically, we have compared the efficacy of the complete observation-only method and the mean substitution, regression, predictive mean matching, propensity score, and Markov Chain Monte Carlo (MCMC) imputation methods. We found that the MCMC and propensity score multiple imputation approaches are, in general, superior to the other imputation methods for the configurations and training-sample sizes we considered. 相似文献

17.

Missing data methods for arbitrary missingness with small samples

Daniel McNeish 《Journal of applied statistics》2017,44(1):24-39

Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study. 相似文献

18.

GRAPHICAL SENSITIVITY ANALYSIS WITH DIFFERENT METHODS OF IMPUTATION FOR A TRIAL WITH PROBABLE NON‐IGNORABLE MISSING DATA

M. Weatherall R.M. Pickering Scott Harris 《Australian & New Zealand Journal of Statistics》2009,51(4):397-413

Graphical sensitivity analyses have recently been recommended for clinical trials with non‐ignorable missing outcome. We demonstrate an adaptation of this methodology for a continuous outcome of a trial of three cognitive‐behavioural therapies for mild depression in primary care, in which one arm had unexpectedly high levels of missing data. Fixed‐value and multiple imputations from a normal distribution (assuming either varying mean and fixed standard deviation, or fixed mean and varying standard deviation) were used to obtain contour plots of the contrast estimates with their P‐values superimposed, their confidence intervals, and the root mean square errors. Imputation was based either on the outcome value alone, or on change from baseline. The plots showed fixed‐value imputation to be more sensitive than imputing from a normal distribution, but the normally distributed imputations were subject to sampling noise. The contours of the sensitivity plots were close to linear in appearance, with the slope approximately equal to the ratio of the proportions of subjects with missing data in each trial arm. 相似文献

19.

A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data

A. F. Donneau M. Mauer G. Molenberghs A. Albert 《统计学通讯:模拟与计算》2015,44(5):1311-1338

Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data. 相似文献

20.

Estimation in Regressive Logistic Regression Analyses of Familial Data with Missing Outcomes

Patrick E.B. FitzGerald & Matthew W. Knuiman 《Australian & New Zealand Journal of Statistics》1998,40(3):305-316

This paper examines a number of methods of handling missing outcomes in regressive logistic regression modelling of familial binary data, and compares them with an EM algorithm approach via a simulation study. The results indicate that a strategy based on imputation of missing values leads to biased estimates, and that a strategy of excluding incomplete families has a substantial effect on the variability of the parameter estimates. Recommendations are made which depend, amongst other factors, on the amount of missing data and on the availability of software. 相似文献