首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 5 毫秒
Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data.  相似文献   

It is now a standard practice to replace missing data in longitudinal surveys with imputed values, but there is still much uncertainty about the best approach to adopt. Using data from a real survey, we compared different strategies combining multiple imputation and the chained equations method, the two main objectives being (1) to explore the impact of the explanatory variables in the chained regression equations and (2) to study the effect of imputation on causality between successive waves of the survey. Results were very stable from one simulation to another, and no systematic bias did appear. The critical points of the method lied in the proper choice of covariates and in the respect of the temporal relation between variables.  相似文献   

Summary.  Multiple imputation is now a well-established technique for analysing data sets where some units have incomplete observations. Provided that the imputation model is correct, the resulting estimates are consistent. An alternative, weighting by the inverse probability of observing complete data on a unit, is conceptually simple and involves fewer modelling assumptions, but it is known to be both inefficient (relative to a fully parametric approach) and sensitive to the choice of weighting model. Over the last decade, there has been a considerable body of theoretical work to improve the performance of inverse probability weighting, leading to the development of 'doubly robust' or 'doubly protected' estimators. We present an intuitive review of these developments and contrast these estimators with multiple imputation from both a theoretical and a practical viewpoint.  相似文献   

In the presence of missing values, researchers may be interested in the rates of missing information. The rates of missing information are (a) important for assessing how the missing information contributes to inferential uncertainty about, Q, the population quantity of interest, (b) are an important component in the decision of the number of imputations, and (c) can be used to test model uncertainty and model fitting. In this article I will derive the asymptotic distribution of the rates of missing information in two scenarios: the conventional multiple imputation (MI), and the two-stage MI. Numerically I will show that the proposed asymptotic distribution agrees with the simulated one. I will also suggest the number of imputations needed to obtain reliable missing information rate estimates for each method, based on the asymptotic distribution.  相似文献   

This article develops a functional form of the generalized Poisson regression model that parametrically nests the Poisson and the two well known generalized Poisson regression models (GP-1 and GP-2). The proposed model is applied on the Malaysian motor insurance claim count data.  相似文献   


Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal.  相似文献   

邰凌楠等 《统计研究》2018,35(9):115-128
数据缺失问题普遍存在于应用研究中。在随机缺失机制假定下,本文从模型推断角度出发,针对线性缺失分位回归模型,提出一种新的有效估计方法——逆概率多重加权(IPMW)估计。该方法是在逆概率加权(IPW)估计的基础上,结合倾向得分匹配及模型平均思想,经过多次估计,加权确定最终参数估计结果。该方法适用于响应变量是独立同分布或独立非同分布的情形,并适用于绝大多数缺失场景。经过理论推导及模拟研究发现,IPMW估计量在继承IPW估计量的优势上具有更稳健的性质。最后,将该方法应用于含有缺失数据的微观调查数据中,研究了经济较发达的准一线城市中等收入群体消费水平的影响因素,对比两种估计方法的估计结果及置信带,发现逆概率多重加权估计量的标准偏差更小,估计结果更稳健。  相似文献   

Data Augmentation(DA)插补法是最常用的MCMC多重插补法之一。利用模拟方法研究基于DA插补法的线性回归模型的系数估计值,分析估计值的统计性质受无回答机制、无回答率和插补重数的影响。模拟结果显示:在完全随机无回答机制下,选择较小插补重数常常会得到较好的回归系数估计值;在随机无回答机制下,随着无回答率增大而选择更大插补重数往往会得到更好的回归系数估计值;在非随机无回答机制下,选择更大插补重数并不一定总会得到更好的回归系数估计值。  相似文献   

于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

This article presents findings from a case study of different approaches to the treatment of missing data. Simulations based on data from the Los Angeles Mammography Promotion in Churches Program (LAMP) led the authors to the following cautionary conclusions about the treatment of missing data: (1) Automated selection of the imputation model in the use of full Bayesian multiple imputation can lead to unexpected bias in coefficients of substantive models. (2) Under conditions that occur in actual data, casewise deletion can perform less well than we were led to expect by the existing literature. (3) Relatively unsophisticated imputations, such as mean imputation and conditional mean imputation, performed better than the technical literature led us to expect. (4) To underscore points (1), (2), and (3), the article concludes that imputation models are substantive models, and require the same caution with respect to specificity and calculability. The research reported here was partially supported by National Institutes of Health, National Cancer Institute, R01 CA65879 (SAF). We thank Nicholas Wolfinger, Naihua Duan, John Adams, John Fox, and the anonymous referees for their thoughtful comments on earlier drafts. The responsibility for any remaining errors is ours alone. Benjamin Stein was exceptionally helpful in orchestrating the simulations at the labs of UCLA Social Science Computing. Michael Mitchell of the UCLA Academic Technology Services Statistical Consulting Group artfully created Fig. 1 using the Stata graphics language; we are most grateful.  相似文献   

When analyzing data with missing data, a commonly used method is the inverse probability weighting (IPW) method, which reweights estimating equations with propensity scores. The popularity of the IPW method is due to its simplicity. However, it is often being criticized for being inefficient because most of the information from the incomplete observations is not used. Alternatively, the regression method is known to be efficient but is nonrobust to the misspecification of the regression function. In this article, we propose a novel way of optimally combining the propensity score function and the regression model. The resulting estimating equation enjoys the properties of robustness against misspecification of either the propensity score or the regression function, as well as being locally semiparametric efficient. We demonstrate analytically situations where our method leads to a more efficient estimator than some of its competitors. In a simulation study, we show the new method compares favorably with its competitors in finite samples. Supplementary materials for this article are available online.  相似文献   

Suppose that we have a nonparametric regression model Y = m(X) + ε with XRp, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x0) for fixed x0Rp are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x0). We also construct an empirical likelihood (EL) statistic for m(x0) with limiting distribution of χ21, which is used to construct an EL confidence interval for m(x0).  相似文献   

This article discusses regression analysis of current status data, which occur in many fields including cross-sectional studies, demographical investigations, and tumorigenicity experiments (Keiding, 1991 Keiding , N. ( 1991 ). Age-specific incidence and prevalence: a statistical perspective (with discussion) . J. Roy. Statist. Soc. Ser. A 154 : 371412 .[Crossref] [Google Scholar]; Sun 2006 Sun , J. ( 2006 ). The Statistical Analysis of Interval-Censored Failure Time Data . New York : Springer-Verlag . [Google Scholar]). For the problem, we focus on the situation where the survival time of interest can be described by the additive hazards model and a multiple imputation approach is presented for inference. A major advantage of the approach is its simplicity and it can be easily implemented by using the existing software packages for right-censored failure time data. Extensive simulation studies are conducted and indicate that the approach performs well for practical situations and is comparable to the existing methods. The methodology is applied to a set of current status data arising from a tumorigenicity experiment and the model checking is discussed.  相似文献   

基于链式方程的收入变量 缺失值的多重插补   总被引:2,自引:0,他引:2       下载免费PDF全文
刘凤芹 《统计研究》2009,26(1):71-77
 在经济计量分析中收入变量的缺失值是一个普遍而又较难处理的问题。传统的处理方法往往导致分析结果具有系统偏差。本文提出利用基于链式方程的多重插补方法来处理收入变量的缺失值问题。文章将此方法应用到一个实际数据集,然后通过分析插补后的数据集讨论了此方法的性质,并和其他多重插补方法进行了比较。结果表明:基于链式方程的多重插补能在一定程度上纠正推断结果的系统偏差,并且给出恰当的标准差估计。  相似文献   

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

Useful properties of a general-purpose imputation method for numerical data are suggested and discussed in the context of several large government surveys. Imputation based on predictive mean matching is proposed as a useful extension of methods in existing practice, and versions of the method are presented for unit nonresponse and item nonresponse with a general pattern of missingness. Extensions of the method to provide multiple imputations are also considered. Pros and cons of weighting adjustments are discussed, and weighting-based analogs to predictive mean matching are outlined.  相似文献   

Summary.  The paper develops a data augmentation method to estimate the distribution function of a variable, which is partially observed, under a non-ignorable missing data mechanism, and where surrogate data are available. An application to the estimation of hourly pay distributions using UK Labour Force Survey data provides the main motivation. In addition to considering a standard parametric data augmentation method, we consider the use of hot deck imputation methods as part of the data augmentation procedure to improve the robustness of the method. The method proposed is compared with standard methods that are based on an ignorable missing data mechanism, both in a simulation study and in the Labour Force Survey application. The focus is on reducing bias in point estimation, but variance estimation using multiple imputation is also considered briefly.  相似文献   

In this paper, we study linear regression analysis when some of the censoring indicators are missing at random. We define regression calibration estimate, imputation estimate and inverse probability weighted estimate for the regression coefficient vector based on the weighted least squared approach due to Stute (1993), and prove all the estimators are asymptotically normal. A simulation study was conducted to evaluate the finite properties of the proposed estimators, and a real data example is provided to illustrate our methods.  相似文献   

Parametric model-based regression imputation is commonly applied to missing-data problems, but is sensitive to misspecification of the imputation model. Little and An (2004 Little , R. J. A. , An , H. ( 2004 ). Robust likelihood-based analysis of multivariate data with missing values . Statistica Sinica 14 : 949968 .[Web of Science ®] [Google Scholar]) proposed a semiparametric approach called penalized spline propensity prediction (PSPP), where the variable with missing values is modeled by a penalized spline (P-Spline) of the response propensity score, which is logit of the estimated probability of being missing given the observed variables. Variables other than the response propensity are included parametrically in the imputation model. However they only considered point estimation based on single imputation with PSPP. We consider here three approaches to standard errors estimation incorporating the uncertainty due to non response: (a) standard errors based on the asymptotic variance of the PSPP estimator, ignoring sampling error in estimating the response propensity; (b) standard errors based on the bootstrap method; and (c) multiple imputation-based standard errors using draws from the joint posterior predictive distribution of missing values under the PSPP model. Simulation studies suggest that the bootstrap and multiple imputation approaches yield good inferences under a range of simulation conditions, with multiple imputation showing some evidence of closer to nominal confidence interval coverage when the sample size is small.  相似文献   

Recently, least absolute deviations (LAD) estimator for median regression models with doubly censored data was proposed and the asymptotic normality of the estimator was established. However, it is invalid to make inference on the regression parameter vectors, because the asymptotic covariance matrices are difficult to estimate reliably since they involve conditional densities of error terms. In this article, three methods, which are based on bootstrap, random weighting, and empirical likelihood, respectively, and do not require density estimation, are proposed for making inference for the doubly censored median regression models. Simulations are also done to assess the performance of the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号