首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We discuss the use of latent variable models with observed covariates for computing response propensities for sample respondents. A response propensity score is often used to weight item and unit responders to account for item and unit non-response and to obtain adjusted means and proportions. In the context of attitude scaling, we discuss computing response propensity scores by using latent variable models for binary or nominal polytomous manifest items with covariates. Our models allow the response propensity scores to be found for several different items without refitting. They allow any pattern of missing responses for the items. If one prefers, it is possible to estimate population proportions directly from the latent variable models, so avoiding the use of propensity scores. Artificial data sets and a real data set extracted from the 1996 British Social Attitudes Survey are used to compare the various methods proposed.  相似文献   

2.
ABSTRACT

To estimate causal treatment effects, we propose a new matching approach based on the reduced covariates obtained from sufficient dimension reduction. Compared with the original covariates and the propensity score, which are commonly used for matching in the literature, the reduced covariates are nonparametrically estimable and are effective in imputing the missing potential outcomes, under a mild assumption on the low-dimensional structure of the data. Under the ignorability assumption, the consistency of the proposed approach requires a weaker common support condition. In addition, researchers are allowed to employ different reduced covariates to find matched subjects for different treatment groups. We develop relevant asymptotic results and conduct simulation studies as well as real data analysis to illustrate the usefulness of the proposed approach.  相似文献   

3.
Biao Zhang 《Statistics》2016,50(5):1173-1194
Missing covariate data occurs often in regression analysis. We study methods for estimating the regression coefficients in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866] on regression analyses with missing covariates, in which they pioneered the use of two working models, the working propensity score model and the working conditional score model. A recent approach to missing covariate data analysis is the empirical likelihood method of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503], which effectively combines unbiased estimating equations. In this paper, we consider an alternative likelihood approach based on the full likelihood of the observed data. This full likelihood-based method enables us to generate estimators for the vector of the regression coefficients that are (a) asymptotically equivalent to those of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the working propensity score model is correctly specified, and (b) doubly robust, like the augmented inverse probability weighting (AIPW) estimators of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–866]. Thus, the proposed full likelihood-based estimators improve on the efficiency of the AIPW estimators when the working propensity score model is correct but the working conditional score model is possibly incorrect, and also improve on the empirical likelihood estimators of Qin, Zhang and Leung [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the reverse is true, that is, the working conditional score model is correct but the working propensity score model is possibly incorrect. In addition, we consider a regression method for estimation of the regression coefficients when the working conditional score model is correctly specified; the asymptotic variance of the resulting estimator is no greater than the semiparametric variance bound characterized by the theory of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866]. Finally, we compare the finite-sample performance of various estimators in a simulation study.  相似文献   

4.
Several survival regression models have been developed to assess the effects of covariates on failure times. In various settings, including surveys, clinical trials and epidemiological studies, missing data may often occur due to incomplete covariate data. Most existing methods for lifetime data are based on the assumption of missing at random (MAR) covariates. However, in many substantive applications, it is important to assess the sensitivity of key model inferences to the MAR assumption. The index of sensitivity to non-ignorability (ISNI) is a local sensitivity tool to measure the potential sensitivity of key model parameters to small departures from the ignorability assumption, needless of estimating a complicated non-ignorable model. We extend this sensitivity index to evaluate the impact of a covariate that is potentially missing, not at random in survival analysis, using parametric survival models. The approach will be applied to investigate the impact of missing tumor grade on post-surgical mortality outcomes in individuals with pancreas-head cancer in the Surveillance, Epidemiology, and End Results data set. For patients suffering from cancer, tumor grade is an important risk factor. Many individuals in these data with pancreas-head cancer have missing tumor grade information. Our ISNI analysis shows that the magnitude of effect for most covariates (with significant effect on the survival time distribution), specifically surgery and tumor grade as some important risk factors in cancer studies, highly depends on the missing mechanism assumption of the tumor grade. Also a simulation study is conducted to evaluate the performance of the proposed index in detecting sensitivity of key model parameters.  相似文献   

5.
We consider data with a continuous outcome that is missing at random and a fully observed set of covariates. We compare by simulation a variety of doubly-robust (DR) estimators for estimating the mean of the outcome. An estimator is DR if it is consistent when either the regression model for the mean function or the propensity to respond is correctly specified. Performance of different methods is compared in terms of root mean squared error of the estimates and width and coverage of confidence intervals or posterior credibility intervals in repeated samples. Overall, the DR methods tended to yield better inference than the incorrect model when either the propensity or mean model is correctly specified, but were less successful for small sample sizes, where the asymptotic DR property is less consequential. Two methods tended to outperform the other DR methods: penalized spline of propensity prediction [Little RJA, An H. Robust likelihood-based analysis of multivariate data with missing values. Statist Sinica. 2004;14:949–968] and the robust method proposed in [Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:723–734].  相似文献   

6.
Conditional power calculations are frequently used to guide the decision whether or not to stop a trial for futility or to modify planned sample size. These ignore the information in short‐term endpoints and baseline covariates, and thereby do not make fully efficient use of the information in the data. We therefore propose an interim decision procedure based on the conditional power approach which exploits the information contained in baseline covariates and short‐term endpoints. We will realize this by considering the estimation of the treatment effect at the interim analysis as a missing data problem. This problem is addressed by employing specific prediction models for the long‐term endpoint which enable the incorporation of baseline covariates and multiple short‐term endpoints. We show that the proposed procedure leads to an efficiency gain and a reduced sample size, without compromising the Type I error rate of the procedure, even when the adopted prediction models are misspecified. In particular, implementing our proposal in the conditional power approach enables earlier decisions relative to standard approaches, whilst controlling the probability of an incorrect decision. This time gain results in a lower expected number of recruited patients in case of stopping for futility, such that fewer patients receive the futile regimen. We explain how these methods can be used in adaptive designs with unblinded sample size re‐assessment based on the inverse normal P‐value combination method to control Type I error. We support the proposal by Monte Carlo simulations based on data from a real clinical trial.  相似文献   

7.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

8.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

9.
We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration.  相似文献   

10.
In longitudinal data, missing observations occur commonly with incomplete responses and covariates. Missing data can have a ‘missing not at random’ mechanism, a non‐monotone missing pattern, and moreover response and covariates can be missing not simultaneously. To avoid complexities in both modelling and computation, a two‐stage estimation method and a pairwise‐likelihood method are proposed. The two‐stage estimation method enjoys simplicities in computation, but incurs more severe efficiency loss. On the other hand, the pairwise approach leads to estimators with better efficiency, but can be cumbersome in computation. In this paper, we develop a compromise method using a hybrid pairwise‐likelihood framework. Our proposed approach has better efficiency than the two‐stage method, but its computational cost is still reasonable compared to the pairwise approach. The performance of the methods is evaluated empirically by means of simulation studies. Our methods are used to analyse longitudinal data obtained from the National Population Health Study.  相似文献   

11.
This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models.  相似文献   

12.
刘展等 《统计研究》2021,38(11):130-140
随着大数据与互联网技术的迅猛发展,网络调查的应用越来越广泛。本文提出网络调查样本的随机森林倾向得分模型推断方法,通过构建若干棵分类决策树组成随机森林,对网络调查样本单元的倾向得分进行估计,从而实现对总体的推断。模拟分析和实证研究结果表明:基于随机森林倾向得分模型的总体均值估计的相对偏差、方差与均方误差均比基于Logistic倾向得分模型的总体均值估计的相对偏差、方差与均方误差小,提出的方法估计效果更好。  相似文献   

13.
Summary One of the most salient data problems empirical researchers face is the lack of informative responses in survey data. This contribution briefly surveys the literature on item nonresponse behavior and its determinants before it describes four approaches to address item nonresponse problems: Casewise deletion of observations, weighting, imputation, and model-based procedures. We describe the basic approaches, their strengths and weaknesses and illustrate some of their effects using a simulation study. The paper concludes with some recommendations for the applied researcher. We are grateful to an anonymous referee who provided helpful comments. Also we like to thank Donald B. Rubin for helpful comments and always motivating discussions as well as Ralf Münnich for inspiring discussions about raking procedures.  相似文献   

14.
It is quite a challenge to develop model‐free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh‐dimensional covariates with full data can be applied to missing response case. The first method is the so‐called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram‐based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property.  相似文献   

15.
This paper discusses regression analysis of current status or case I interval‐censored failure time data arising from the additive hazards model. In this situation, some covariates could be missing because of various reasons, but there may exist some auxiliary information about the missing covariates. To address the problem, we propose an estimated partial likelihood approach for estimation of regression parameters, which makes use of the available auxiliary information. The method can be easily implemented, and the asymptotic properties of the resulting estimates are established. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted and indicates that the method works well.  相似文献   

16.
We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined.  相似文献   

17.
Missing data are common in many experiments, including surveys, clinical trials, epidemiological studies, and environmental studies. Unconstrained likelihood inferences for generalized linear models (GLMs) with nonignorable missing covariates have been studied extensively in the literature. However, parameter orderings or constraints may occur naturally in practice, and thus the efficiency of a statistical method may be improved by incorporating parameter constraints into the likelihood function. In this paper, we consider constrained inference for analysing GLMs with nonignorable missing covariates under linear inequality constraints on the model parameters. Specifically, constrained maximum likelihood (ML) estimation is based on the gradient projection expectation maximization approach. Further, we investigate the asymptotic null distribution of the constrained likelihood ratio test (LRT). Simulations study the empirical properties of the constrained ML estimators and LRTs, which demonstrate improved precision of these constrained techniques. An application to contaminant levels in an environmental study is also presented.  相似文献   

18.
We propose an easy to derive and simple to compute approximate least squares or maximum likelihood estimator for nonlinear errors-in-variables models that does not require the knowledge of the conditional density of the latent variables given the observables. Specific examples and Monte Carlo studies demonstrate that the bias of this approximate estimator is small even when the magnitude of the variance of measurement errors to the variance of measured covariates is large. Cheng Hsiao and Qing Wang's work was supported in part by National Science Foundation grant SeS91-22481 and SBR94-09540. Liqun Wang gratefully acknowledges the financial support from Swiss National Science Foundation. We wish to thank Professor H. Schneeweiss and a referee for helpful comments and suggestions.  相似文献   

19.
In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means.  相似文献   

20.
Longitudinal data often contain missing observations, and it is in general difficult to justify particular missing data mechanisms, whether random or not, that may be hard to distinguish. The authors describe a likelihood‐based approach to estimating both the mean response and association parameters for longitudinal binary data with drop‐outs. They specify marginal and dependence structures as regression models which link the responses to the covariates. They illustrate their approach using a data set from the Waterloo Smoking Prevention Project They also report the results of simulation studies carried out to assess the performance of their technique under various circumstances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号