首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Doubly robust (DR) estimators of the mean with missing data are compared. An estimator is DR if either the regression of the missing variable on the observed variables or the missing data mechanism is correctly specified. One method is to include the inverse of the propensity score as a linear term in the imputation model [D. Firth and K.E. Bennett, Robust models in probability sampling, J. R. Statist. Soc. Ser. B. 60 (1998), pp. 3–21; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146; H. Bang and J.M. Robins, Doubly robust estimation in missing data and causal inference models, Biometrics 61 (2005), pp. 962–972]. Another method is to calibrate the predictions from a parametric model by adding a mean of the weighted residuals [J.M Robins, A. Rotnitzky, and L.P. Zhao, Estimation of regression coefficients when some regressors are not always observed, J. Am. Statist. Assoc. 89 (1994), pp. 846–866; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146]. The penalized spline propensity prediction (PSPP) model includes the propensity score into the model non-parametrically [R.J.A. Little and H. An, Robust likelihood-based analysis of multivariate data with missing values, Statist. Sin. 14 (2004), pp. 949–968; G. Zhang and R.J. Little, Extensions of the penalized spline propensity prediction method of imputation, Biometrics, 65(3) (2008), pp. 911–918]. All these methods have consistency properties under misspecification of regression models, but their comparative efficiency and confidence coverage in finite samples have received little attention. In this paper, we compare the root mean square error (RMSE), width of confidence interval and non-coverage rate of these methods under various mean and response propensity functions. We study the effects of sample size and robustness to model misspecification. The PSPP method yields estimates with smaller RMSE and width of confidence interval compared with other methods under most situations. It also yields estimates with confidence coverage close to the 95% nominal level, provided the sample size is not too small.  相似文献   

2.
Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.  相似文献   

3.
Biao Zhang 《Statistics》2016,50(5):1173-1194
Missing covariate data occurs often in regression analysis. We study methods for estimating the regression coefficients in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866] on regression analyses with missing covariates, in which they pioneered the use of two working models, the working propensity score model and the working conditional score model. A recent approach to missing covariate data analysis is the empirical likelihood method of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503], which effectively combines unbiased estimating equations. In this paper, we consider an alternative likelihood approach based on the full likelihood of the observed data. This full likelihood-based method enables us to generate estimators for the vector of the regression coefficients that are (a) asymptotically equivalent to those of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the working propensity score model is correctly specified, and (b) doubly robust, like the augmented inverse probability weighting (AIPW) estimators of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–866]. Thus, the proposed full likelihood-based estimators improve on the efficiency of the AIPW estimators when the working propensity score model is correct but the working conditional score model is possibly incorrect, and also improve on the empirical likelihood estimators of Qin, Zhang and Leung [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the reverse is true, that is, the working conditional score model is correct but the working propensity score model is possibly incorrect. In addition, we consider a regression method for estimation of the regression coefficients when the working conditional score model is correctly specified; the asymptotic variance of the resulting estimator is no greater than the semiparametric variance bound characterized by the theory of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866]. Finally, we compare the finite-sample performance of various estimators in a simulation study.  相似文献   

4.
Abstract

Estimation of average treatment effect is crucial in causal inference for evaluation of treatments or interventions in biostatistics, epidemiology, econometrics, sociology. However, existing estimators require either a propensity score model, an outcome vector model, or both is correctly specified, which is difficult to verify in practice. In this paper, we allow multiple models for both the propensity score models and the outcome models, and then construct a weighting estimator based on observed data by using two-sample empirical likelihood. The resulting estimator is consistent if any one of those multiple models is correctly specified, and thus provides multiple protection on consistency. Moreover, the proposed estimator can attain the semiparametric efficiency bound when one propensity score model and one outcome vector model are correctly specified, without requiring knowledge of which models are correct. Simulations are performed to evaluate the finite sample performance of the proposed estimators. As an application, we analyze the data collected from the AIDS Clinical Trials Group Protocol 175.  相似文献   

5.
This article considers Robins's marginal and nested structural models in the cross‐sectional setting and develops likelihood and regression estimators. First, a nonparametric likelihood method is proposed by retaining a finite subset of all inherent and modelling constraints on the joint distributions of potential outcomes and covariates under a correctly specified propensity score model. A profile likelihood is derived by maximizing the nonparametric likelihood over these joint distributions subject to the retained constraints. The maximum likelihood estimator is intrinsically efficient based on the retained constraints and weakly locally efficient. Second, two regression estimators, named hat and tilde, are derived as first‐order approximations to the likelihood estimator under the propensity score model. The tilde regression estimator is intrinsically and weakly locally efficient and doubly robust. The methods are illustrated by data analysis for an observational study on right heart catheterization. The Canadian Journal of Statistics 38: 609–632; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
Combining-100 information from multiple samples is often needed in biomedical and economic studies, but differences between these samples must be appropriately taken into account in the analysis of the combined data. We study the estimation for moment restriction models with data combined from two samples under an ignorability-type assumption while allowing for different marginal distributions of variables common to both samples. Suppose that an outcome regression (OR) model and a propensity score (PS) model are specified. By leveraging semi-parametric efficiency theory, we derive an augmented inverse probability-weighted (AIPW) estimator that is locally efficient and doubly robust with respect to these models. Furthermore, we develop calibrated regression and likelihood estimators that are not only locally efficient and doubly robust but also intrinsically efficient in achieving smaller variances than the AIPW estimator when the PS model is correctly specified but the OR model may be mispecified. As an important application, we study the two-sample instrumental variable problem and derive the corresponding estimators while allowing for incompatible distributions of variables common to the two samples. Finally, we provide a simulation study and an econometric application on public housing projects to demonstrate the superior performance of our improved estimators. The Canadian Journal of Statistics 48: 259–284; 2020 © 2019 Statistical Society of Canada  相似文献   

7.
Since the publication of the seminal paper by Cox (1972), proportional hazard model has become very popular in regression analysis for right censored data. In observational studies, treatment assignment may depend on observed covariates. If these confounding variables are not accounted for properly, the inference based on the Cox proportional hazard model may perform poorly. As shown in Rosenbaum and Rubin (1983), under the strongly ignorable treatment assignment assumption, conditioning on the propensity score yields valid causal effect estimates. Therefore we incorporate the propensity score into the Cox model for causal inference with survival data. We derive the asymptotic property of the maximum partial likelihood estimator when the model is correctly specified. Simulation results show that our method performs quite well for observational data. The approach is applied to a real dataset on the time of readmission of trauma patients. We also derive the asymptotic property of the maximum partial likelihood estimator with a robust variance estimator, when the model is incorrectly specified.  相似文献   

8.
The authors consider a double robust estimation of the regression parameter defined by an estimating equation in a surrogate outcome set‐up. Under a correct specification of the propensity score, the proposed estimator has smallest trace of asymptotic covariance matrix whether the “working outcome regression model” involved is specified correct or not, and it is particularly meaningful when it is incorrectly specified. Simulations are conducted to examine the finite sample performance of the proposed procedure. Data on obesity and high blood pressure are analyzed for illustration. The Canadian Journal of Statistics 38: 633–646; 2010 © 2010 Statistical Society of Canada  相似文献   

9.
Survival functions are often estimated by nonparametric estimators such as the Kaplan‐Meier estimator. For valid estimation, proper adjustment for confounding factors is needed when treatment assignment may depend on confounding factors. Inverse probability weighting is a commonly used approach, especially when there is a large number of potential confounders to adjust for. Direct adjustment may also be used if the relationship between the time‐to‐event and all confounders can be modeled. However, either approach requires a correctly specified model for the relationship between confounders and treatment allocation or between confounders and the time‐to‐event. We propose a pseudo‐observation–based doubly robust estimator, which is valid when either the treatment allocation model or the time‐to‐event model is correctly specified and is generally more efficient than the inverse probability weighting approach. The approach can be easily implemented using standard software. A simulation study was conducted to evaluate this approach under a number of scenarios, and the results are presented and discussed. The results confirm robustness and efficiency of the proposed approach. A real data example is also provided for illustration.  相似文献   

10.
邰凌楠等 《统计研究》2018,35(9):115-128
数据缺失问题普遍存在于应用研究中。在随机缺失机制假定下,本文从模型推断角度出发,针对线性缺失分位回归模型,提出一种新的有效估计方法——逆概率多重加权(IPMW)估计。该方法是在逆概率加权(IPW)估计的基础上,结合倾向得分匹配及模型平均思想,经过多次估计,加权确定最终参数估计结果。该方法适用于响应变量是独立同分布或独立非同分布的情形,并适用于绝大多数缺失场景。经过理论推导及模拟研究发现,IPMW估计量在继承IPW估计量的优势上具有更稳健的性质。最后,将该方法应用于含有缺失数据的微观调查数据中,研究了经济较发达的准一线城市中等收入群体消费水平的影响因素,对比两种估计方法的估计结果及置信带,发现逆概率多重加权估计量的标准偏差更小,估计结果更稳健。  相似文献   

11.
This paper is concerned with model averaging procedure for varying-coefficient partially linear models with missing responses. The profile least-squares estimation process and inverse probability weighted method are employed to estimate regression coefficients of the partially restricted models, in which the propensity score is estimated by the covariate balancing propensity score method. The estimators of the linear parameters are shown to be asymptotically normal. Then we develop the focused information criterion, formulate the frequentist model averaging estimators and construct the corresponding confidence intervals. Some simulation studies are conducted to examine the finite sample performance of the proposed methods. We find that the covariate balancing propensity score improves the performance of the inverse probability weighted estimator. We also demonstrate the superiority of the proposed model averaging estimators over those of existing strategies in terms of mean squared error and coverage probability. Finally, our approach is further applied to a real data example.  相似文献   

12.
In this article we consider estimation of causal parameters in a marginal structural model for the discrete intensity of the treatment specific counting process (e.g. hazard of a treatment specific survival time) based on longitudinal observational data on treatment, covariates and survival. We define three estimators: the inverse probability of treatment weighted (IPTW) estimator, the maximum likelihood estimator (MLE), and a double robust (DR) estimator. The DR estimator is obtained by following a general methodology for constructing double robust estimating functions in censored data models as described in van der Laan and Robins (Unified Methods for Censored Longitudinal Data and Causality, 2002). The double-robust estimator is consistent and asymptotically linear when either the treatment mechanism or the partial likelihood of the observed data is consistently estimated. We illustrate the superiority of the DR estimator relative to the IPTW and ML estimators in a simulation study. The proposed methodology is also applied to estimate the causal effect of exercise on physical functioning in a longitudinal study of seniors in Sonoma County.  相似文献   

13.
In this paper, we propose an empirical likelihood-based weighted estimator of regression parameter in quantile regression model with non ignorable missing covariates. The proposed estimator is computationally simple and achieves semiparametric efficiency if the probability of missingness on the fully observed variables is correctly specified. The efficiency gain of the proposed estimator over the complete-case-analysis estimator is quantified theoretically and illustrated via simulation and a real data application.  相似文献   

14.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

15.
Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric (P) models, correctness of the assumed model is critical for the validity of the ensuing inference. An incorrectly specified P means model may be improved by using a local, or nonparametric (NP), model. Two local models are proposed by a pointwise weighting of the marginal and conditional variance–covariance matrices. However, NP models tend to fit to irregularities in the data and may provide fits with high variance. Model robust regression techniques estimate mean response as a convex combination of a P and a NP model fit to the data. It is a semiparametric method by which incomplete or incorrectly specified P models can be improved by adding an appropriate amount of the NP fit. We compare the approximate integrated mean square error of the P, NP, and mixed model robust methods via a simulation study and apply these methods to two real data sets: the monthly wind speed data from countries in Ireland and the engine speed data.  相似文献   

16.
Suppose we observe an ergodic Markov chain on the real line, with a parametric model for the autoregression function, i.e. the conditional mean of the transition distribution. If one specifies, in addition, a parametric model for the conditional variance, one can define a simple estimator for the parameter, the maximum quasi-likelihood estimator. It is robust against misspecification of the conditional variance, but not efficient. We construct an estimator which is adaptive in the sense that it is efficient if the conditional variance is misspecified, and asymptotically as good as the maximum quasi-likelihood estimator if the conditional variance is correctly specified. The adaptive estimator is a weighted nonlinear least-squares estimator, with weights given by predictors for the conditional variance.  相似文献   

17.
Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution.  相似文献   

18.
Propensity score matching has been a long-standing tradition for handling confounding in causal inference, however, requiring stringent model assumptions. In this article, we propose novel double score matching (DSM) utilizing both the propensity score and prognostic score. To gain the protection of possible model misspecification, we posit multiple candidate models for each score. We show that the debiasing DSM estimator achieves the multiple robustness property in that it is consistent if any one of the score models is correctly specified. We characterize the asymptotic distribution for the DSM estimator requiring only one correct model specification based on the martingale representations of the matching estimators and theory for local normal experiments. We also provide a two-stage replication method for variance estimation and extend DSM for quantile estimation. Simulation demonstrates DSM outperforms single-score matching and prevailing multiply robust weighting estimators in the presence of extreme propensity scores.  相似文献   

19.
Suppose we are interested in estimating the average causal effect (ACE) for the population mean from observational study. Because of simplicity and ease of interpretation, stratification by a propensity score (PS) is widely used to adjust for influence of confounding factors in estimation of the ACE. Appropriateness of the estimation by the PS stratification relies on correct specification of the PS. We propose an estimator based on stratification with multiple PS models by clustering techniques instead of model selection. If one of them correctly specifies, the proposed estimator removes bias and thus is more robust than the standard PS stratification.  相似文献   

20.
吴浩  彭非 《统计研究》2020,37(4):114-128
倾向性得分是估计平均处理效应的重要工具。但在观察性研究中,通常会由于协变量在处理组与对照组分布的不平衡性而导致极端倾向性得分的出现,即存在十分接近于0或1的倾向性得分,这使得因果推断的强可忽略假设接近于违背,进而导致平均处理效应的估计出现较大的偏差与方差。Li等(2018a)提出了协变量平衡加权法,在无混杂性假设下通过实现协变量分布的加权平衡,解决了极端倾向性得分带来的影响。本文在此基础上,提出了基于协变量平衡加权法的稳健且有效的估计方法,并通过引入超级学习算法提升了模型在实证应用中的稳健性;更进一步,将前一方法推广至理论上不依赖于结果回归模型和倾向性得分模型假设的基于协变量平衡加权的稳健有效估计。蒙特卡洛模拟表明,本文提出的两种方法在结果回归模型和倾向性得分模型均存在误设时仍具有极小的偏差和方差。实证部分将两种方法应用于右心导管插入术数据,发现右心导管插入术大约会增加患者6. 3%死亡率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号