首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

2.
Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study.  相似文献   

3.
Important empirical information on household behavior and finances is obtained from surveys, and these data are used heavily by researchers, central banks, and for policy consulting. However, various interdependent factors that can be controlled only to a limited extent lead to unit and item nonresponse, and missing data on certain items is a frequent source of difficulties in statistical practice. More than ever, it is important to explore techniques for the imputation of large survey data. This paper presents the theoretical underpinnings of a Markov chain Monte Carlo multiple imputation procedure and outlines important technical aspects of the application of MCMC-type algorithms to large socio-economic data sets. In an illustrative application it is found that MCMC algorithms have good convergence properties even on large data sets with complex patterns of missingness, and that the use of a rich set of covariates in the imputation models has a substantial effect on the distributions of key financial variables.  相似文献   

4.
Missing data and, more generally, imperfections in implementing a study design are an endemic problem in large scale studies involving human subjects. We present an analysis of an experiment in the interaction between general practitioners and their patients, in which the issue of missing data is addressed by a sensitivity analysis using multiple imputation. Instead of specifying a model for missingness we explore certain extreme ways of departing from the assumption of data missing at random and establish the largest extent of such departures which would still fail to supplant the evidence about the studied effect. An important advantage of the approach is that the algorithm intended for the complete data, to fit generalized linear models with random effects, is used without any alteration.  相似文献   

5.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

6.
Summary.  Social data often contain missing information. The problem is inevitably severe when analysing historical data. Conventionally, researchers analyse complete records only. Listwise deletion not only reduces the effective sample size but also may result in biased estimation, depending on the missingness mechanism. We analyse household types by using population registers from ancient China (618–907 AD) by comparing a simple classification, a latent class model of the complete data and a latent class model of the complete and partially missing data assuming four types of ignorable and non-ignorable missingness mechanisms. The findings show that either a frequency classification or a latent class analysis using the complete records only yielded biased estimates and incorrect conclusions in the presence of partially missing data of a non-ignorable mechanism. Although simply assuming ignorable or non-ignorable missing data produced consistently similarly higher estimates of the proportion of complex households, a specification of the relationship between the latent variable and the degree of missingness by a row effect uniform association model helped to capture the missingness mechanism better and improved the model fit.  相似文献   

7.
Non-response (or missing data) is often encountered in large-scale surveys. To enable the behavioural analysis of these data sets, statistical treatments are commonly applied to complete or remove these data. However, the correctness of such procedures critically depends on the nature of the underlying missingness generation process. Clearly, the efficacy of applying either case deletion or imputation procedures rests on the unknown missingness generation mechanism. The contribution of this paper is twofold. The study is the first to propose a simple sequential method to attempt to identify the form of missingness. Second, the effectiveness of the tests is assessed by generating (experimentally) nine missing data sets by imposed MCAR, MAR and NMAR processes, with data removed.  相似文献   

8.
Abstract

In longitudinal studies data are collected on the same set of units for more than one occasion. In medical studies it is very common to have mixed Poisson and continuous longitudinal data. In such studies, for different reasons, some intended measurements might not be available resulting in a missing data setting. When the probability of missingness is related to the missing values, the missingness mechanism is termed nonrandom. The stochastic expectation-maximization (SEM) algorithm and the parametric fractional imputation (PFI) method are developed to handle nonrandom missingness in mixed discrete and continuous longitudinal data assuming different covariance structures for the continuous outcome. The proposed techniques are evaluated using simulation studies. Also, the proposed techniques are applied to the interstitial cystitis data base (ICDB) data.  相似文献   

9.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

10.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

11.
When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD.  相似文献   

12.
Summary. Missing observations are a common problem that complicate the analysis of clustered data. In the Connecticut child surveys of childhood psychopathology, it was possible to identify reasons why outcomes were not observed. Of note, some of these causes of missingness may be assumed to be ignorable , whereas others may be non-ignorable . We consider logistic regression models for incomplete bivariate binary outcomes and propose mixture models that permit estimation assuming that there are two distinct types of missingness mechanisms: one that is ignorable; the other non-ignorable. A feature of the mixture modelling approach is that additional analyses to assess the sensitivity to assumptions about the missingness are relatively straightforward to incorporate. The methods were developed for analysing data from the Connecticut child surveys, where there are missing informant reports of child psychopathology and different reasons for missingness can be distinguished.  相似文献   

13.
In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.  相似文献   

14.
Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data.  相似文献   

15.
The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

16.
Summary.  In longitudinal studies, missingness of data is often an unavoidable problem. Estimators from the linear mixed effects model assume that missing data are missing at random. However, estimators are biased when this assumption is not met. In the paper, theoretical results for the asymptotic bias are established under non-ignorable drop-out, drop-in and other missing data patterns. The asymptotic bias is large when the drop-out subjects have only one or no observation, especially for slope-related parameters of the linear mixed effects model. In the drop-in case, intercept-related parameter estimators show substantial asymptotic bias when subjects enter late in the study. Eight other missing data patterns are considered and these produce asymptotic biases of a variety of magnitudes.  相似文献   

17.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

18.
Although the effect of missing data on regression estimates has received considerable attention, their effect on predictive performance has been neglected. We studied the performance of three missing data strategies—omission of records with missing values, replacement with a mean and imputation based on regression—on the predictive performance of logistic regression (LR), classification tree (CT) and neural network (NN) models in the presence of data missing completely at random (MCAR). Models were constructed using datasets of size 500 simulated from a joint distribution of binary and continuous predictors including nonlinearities, collinearity and interactions between variables. Though omission produced models that fit better on the data from which the models were developed, imputation was superior on average to omission for all models when evaluating the receiver operating characteristic (ROC) curve area, mean squared error (MSE), pooled variance across outcome categories and calibration X 2 on an independently generated test set. However, in about one-third of simulations, omission performed better. Performance was also more variable with omission including quite a few instances of extremely poor performance. Replacement and imputation generally produced similar results except with neural networks for which replacement, the strategy typically used in neural network algorithms, was inferior to imputation. Missing data affected simpler models much less than they did more complex models such as generalized additive models that focus on local structure For moderate sized datasets, logistic regressions that use simple nonlinear structures such as quadratic terms and piecewise linear splines appear to be at least as robust to randomly missing values as neural networks and classification trees.  相似文献   

19.
Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted log-likelihood function by using an EM algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques.  相似文献   

20.
k-POD: A Method for k-Means Clustering of Missing Data   总被引:1,自引:0,他引:1  
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号