期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient inverse probability weighting method for quantile regression with nonignorable missing data

Pu-Ying Zhao De-Peng Jiang 《Statistics》2017,51(2):363-386

Quantitle regression (QR) is a popular approach to estimate functional relations between variables for all portions of a probability distribution. Parameter estimation in QR with missing data is one of the most challenging issues in statistics. Regression quantiles can be substantially biased when observations are subject to missingness. We study several inverse probability weighting (IPW) estimators for parameters in QR when covariates or responses are subject to missing not at random. Maximum likelihood and semiparametric likelihood methods are employed to estimate the respondent probability function. To achieve nice efficiency properties, we develop an empirical likelihood (EL) approach to QR with the auxiliary information from the calibration constraints. The proposed methods are less sensitive to misspecified missing mechanisms. Asymptotic properties of the proposed IPW estimators are shown under general settings. The efficiency gain of EL-based IPW estimator is quantified theoretically. Simulation studies and a data set on the work limitation of injured workers from Canada are used to illustrated our proposed methodologies. 相似文献

2.

A Class of Weighted Estimating Equations for Semiparametric Transformation Models with Missing Covariates

《Scandinavian Journal of Statistics》2018,45(1):87-109

In survival analysis, covariate measurements often contain missing observations; ignoring this feature can lead to invalid inference. We propose a class of weighted estimating equations for right‐censored data with missing covariates under semiparametric transformation models. Time‐specific and subject‐specific weights are accommodated in the formulation of the weighted estimating equations. We establish unified results for estimating missingness probabilities that cover both parametric and non‐parametric modelling schemes. To improve estimation efficiency, the weighted estimating equations are augmented by a new set of unbiased estimating equations. The resultant estimator has the so‐called ‘double robustness’ property and is optimal within a class of consistent estimators. 相似文献

3.

Handling missing data by deleting completely observed records

Myunghee Cho Paik Cuiling Wang 《Journal of statistical planning and inference》2009

When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769–782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large. 相似文献

4.

Linear Increments with Non‐monotone Missing Data and Measurement Error

下载免费PDF全文

Shaun R. Seaman Daniel Farewell Ian R. White 《Scandinavian Journal of Statistics》2016,43(4):996-1018

Linear increments (LI) are used to analyse repeated outcome data with missing values. Previously, two LI methods have been proposed, one allowing non‐monotone missingness but not independent measurement error and one allowing independent measurement error but only monotone missingness. In both, it was suggested that the expected increment could depend on current outcome. We show that LI can allow non‐monotone missingness and either independent measurement error of unknown variance or dependence of expected increment on current outcome but not both. A popular alternative to LI is a multivariate normal model ignoring the missingness pattern. This gives consistent estimation when data are normally distributed and missing at random (MAR). We clarify the relation between MAR and the assumptions of LI and show that for continuous outcomes multivariate normal estimators are also consistent under (non‐MAR and non‐normal) assumptions not much stronger than those of LI. Moreover, when missingness is non‐monotone, they are typically more efficient. 相似文献

5.

Inverse probability weighted estimators for single-index models with missing covariates

Tingting Li Hu Yang 《统计学通讯:理论与方法》2013,42(5):1199-1214

Abstract

In this article, we consider the inverse probability weighted estimators for a single-index model with missing covariates when the selection probabilities are known or unknown. It is shown that the estimator for the index parameter by using estimated selection probabilities has a smaller asymptotic variance than that with true selection probabilities, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for the index parameter in single index model. However, this difference disappears for the estimators of the link function. Some numerical examples and a real data application are also conducted to illustrate the performances of the estimators. 相似文献

6.

Probability density estimation with data missing at random when covariables are present

Qihua Wang 《Journal of statistical planning and inference》2008

This paper addresses the problem of the probability density estimation in the presence of covariates when data are missing at random (MAR). The inverse probability weighted method is used to define a nonparametric and a semiparametric weighted probability density estimators. A regression calibration technique is also used to define an imputed estimator. It is shown that all the estimators are asymptotically normal with the same asymptotic variance as that of the inverse probability weighted estimator with known selection probability function and weights. Also, we establish the mean squared error (MSE) bounds and obtain the MSE convergence rates. A simulation is carried out to assess the proposed estimators in terms of the bias and standard error. 相似文献

7.

A Weighting Approach for GEE Analysis with Missing Data

Cuiling Wang Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(13):2397-2411

We propose a new weighting (WT) method to handle missing categorical outcomes in longitudinal data analysis using generalized estimating equations (GEE). The proposed WT provides a valid GEE estimator when the data are missing at random (MAR), and has more stable weights and shows advantage in efficiency compared to the inverse probability weighing method in the presence of small observation probabilities. The WT estimator is similar to the stabilized weighting (SWT) estimator under mild conditions, but it is more stable and efficient than SWT when the associations of the outcome with the observation probabilities and the covariate are strong. 相似文献

8.

Influence Function Based Variance Estimation and Missing Data Issues in Case-Cohort Studies 总被引：1，自引：0，他引：1

Mark Steven D. Katki Hormuzd 《Lifetime data analysis》2001,7(4):331-344

Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice. 相似文献

9.

Estimation of Stratified Mark‐Specific Proportional Hazards Models with Missing Marks

YANQING SUN PETER B. GILBERT 《Scandinavian Journal of Statistics》2012,39(1):34-52

Abstract. An objective of randomized placebo‐controlled preventive HIV vaccine efficacy trials is to assess the relationship between the vaccine effect to prevent infection and the genetic distance of the exposing HIV to the HIV strain represented in the vaccine construct. Motivated by this objective, recently a mark‐specific proportional hazards (PH) model with a continuum of competing risks has been studied, where the genetic distance of the transmitting strain is the continuous ‘mark’ defined and observable only in failures. A high percentage of genetic marks of interest may be missing for a variety of reasons, predominantly because rapid evolution of HIV sequences after transmission before a blood sample is drawn from which HIV sequences are measured. This research investigates the stratified mark‐specific PH model with missing marks where the baseline functions may vary with strata. We develop two consistent estimation approaches, the first based on the inverse probability weighted complete‐case (IPW) technique, and the second based on augmenting the IPW estimator by incorporating auxiliary information predictive of the mark. We investigate the asymptotic properties and finite‐sample performance of the two estimators, and show that the augmented IPW estimator, which satisfies a double robustness property, is more efficient. 相似文献

10.

Logistic regression analysis of randomized response data with missing covariates

S.H. Hsieh S.M. Lee P.S. Shen 《Journal of statistical planning and inference》2010

Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this paper, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. In particular, we propose Horvitz and Thompson (1952)-type weighted estimators by using different estimates of the selection probabilities. We present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support theoretical analysis. We also illustrate the approach using data from a survey of cable TV. 相似文献

11.

Empirical Likelihood Inference for Longitudinal Data with Missing Response Variables and Error-Prone Covariates

Tao Zhang 《统计学通讯:理论与方法》2013,42(18):3230-3244

We consider statistical inference for longitudinal partially linear models when the response variable is sometimes missing with missingness probability depending on the covariate that is measured with error. The block empirical likelihood procedure is used to estimate the regression coefficients and residual adjusted block empirical likelihood is employed for the baseline function. This leads us to prove a nonparametric version of Wilk's theorem. Compared with methods based on normal approximations, our proposed method does not require a consistent estimators for the asymptotic variance and bias. An application to a longitudinal study is used to illustrate the procedure developed here. A simulation study is also reported. 相似文献

12.

Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome data

Stuart R. Lipsitz Garrett M. Fitzmaurice Joseph G. Ibrahim Debajyoti Sinha Michael Parzen Steven Lipshultz 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(1):3-20

Summary. In a large, prospective longitudinal study designed to monitor cardiac abnormalities in children born to women who are infected with the human immunodeficiency virus, instead of a single outcome variable, there are multiple binary outcomes (e.g. abnormal heart rate, abnormal blood pressure and abnormal heart wall thickness) considered as joint measures of heart function over time. In the presence of missing responses at some time points, longitudinal marginal models for these multiple outcomes can be estimated by using generalized estimating equations (GEEs), and consistent estimates can be obtained under the assumption of a missingness completely at random mechanism. When the missing data mechanism is missingness at random, i.e. the probability of missing a particular outcome at a time point depends on observed values of that outcome and the remaining outcomes at other time points, we propose joint estimation of the marginal models by using a single modified GEE based on an EM-type algorithm. The method proposed is motivated by the longitudinal study of cardiac abnormalities in children who were born to women infected with the human immunodeficiency virus, and analyses of these data are presented to illustrate the application of the method. Further, in an asymptotic study of bias, we show that, under a missingness at random mechanism in which missingness depends on all observed outcome variables, our joint estimation via the modified GEE produces almost unbiased estimates, provided that the correlation model has been correctly specified, whereas estimates from standard GEEs can lead to substantial bias. 相似文献

13.

缺失数据下的逆概率多重加权分位回归估计及其应用

邰凌楠等《统计研究》2018,35(9):115-128

数据缺失问题普遍存在于应用研究中。在随机缺失机制假定下,本文从模型推断角度出发,针对线性缺失分位回归模型,提出一种新的有效估计方法——逆概率多重加权（IPMW）估计。该方法是在逆概率加权（IPW）估计的基础上,结合倾向得分匹配及模型平均思想,经过多次估计,加权确定最终参数估计结果。该方法适用于响应变量是独立同分布或独立非同分布的情形,并适用于绝大多数缺失场景。经过理论推导及模拟研究发现,IPMW估计量在继承IPW估计量的优势上具有更稳健的性质。最后,将该方法应用于含有缺失数据的微观调查数据中,研究了经济较发达的准一线城市中等收入群体消费水平的影响因素,对比两种估计方法的估计结果及置信带,发现逆概率多重加权估计量的标准偏差更小,估计结果更稳健。相似文献

14.

Comparison Between Two Partial Likelihood Approaches for the Competing Risks Model with Missing Cause of Failure 总被引：1，自引：1，他引：0

Lu K Tsiatis AA 《Lifetime data analysis》2005,11(1):29-40

In many clinical studies where time to failure is of primary interest, patients may fail or die from one of many causes where failure time can be right censored. In some circumstances, it might also be the case that patients are known to die but the cause of death information is not available for some patients. Under the assumption that cause of death is missing at random, we compare the Goetghebeur and Ryan (1995, Biometrika, 82, 821–833) partial likelihood approach with the Dewanji (1992, Biometrika, 79, 855–857)partial likelihood approach. We show that the estimator for the regression coefficients based on the Dewanji partial likelihood is not only consistent and asymptotically normal, but also semiparametric efficient. While the Goetghebeur and Ryan estimator is more robust than the Dewanji partial likelihood estimator against misspecification of proportional baseline hazards, the Dewanji partial likelihood estimator allows the probability of missing cause of failure to depend on covariate information without the need to model the missingness mechanism. Tests for proportional baseline hazards are also suggested and a robust variance estimator is derived. 相似文献

15.

Empirical likelihood-based inference in nonlinear regression models with missing responses at random

Nian-Sheng Tang Pu-Ying Zhao 《Statistics》2013,47(6):1141-1159

This paper investigates the estimations of regression parameters and response mean in nonlinear regression models in the presence of missing response variables that are missing with missingness probabilities depending on covariates. We propose four empirical likelihood (EL)-based estimators for the regression parameters and the response mean. The resulting estimators are shown to be consistent and asymptotically normal under some general assumptions. To construct the confidence regions for the regression parameters as well as the response mean, we develop four EL ratio statistics, which are proven to have the χ² distribution asymptotically. Simulation studies and an artificial data set are used to illustrate the proposed methodologies. Empirical results show that the EL method behaves better than the normal approximation method and that the coverage probabilities and average lengths depend on the selection probability function. 相似文献

16.

Proportional hazards regression in the presence of missing study eligibility information

Qing Pan Douglas E. Schaubel 《Lifetime data analysis》2014,20(3):424-443

We consider the study of censored survival times in the situation where the available data consist of both eligible and ineligible subjects, and information distinguishing the two groups is sometimes missing. A complete-case analysis in this context would use only subjects known to be eligible, resulting in inefficient and potentially biased estimators. We propose a two-step procedure which resembles the EM algorithm but is computationally much faster. In the first step, one estimates the conditional expectation of the missing eligibility indicators given the observed data using a logistic regression based on the complete cases (i.e., subjects with non-missing eligibility indicator). In the second step, maximum likelihood estimators are obtained from a weighted Cox proportional hazards model, with the weights being either observed eligibility indicators or estimated conditional expectations thereof. Under ignorable missingness, the estimators from the second step are proven to be consistent and asymptotically normal, with explicit variance estimators. We demonstrate through simulation that the proposed methods perform well for moderate sized samples and are robust in the presence of eligibility indicators that are missing not at random. The proposed procedure is more efficient and more robust than the complete case analysis and, unlike the EM algorithm, does not require time-consuming iteration. Although the proposed methods are applicable generally, they would be most useful for large data sets (e.g., administrative data), for which the computational savings outweigh the price one has to pay for making various approximations in avoiding iteration. We apply the proposed methods to national kidney transplant registry data. 相似文献

17.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

18.

Marginal regression models with a time to event outcome and discrete multiple source predictors

Litman HJ Horton NJ Murphy JM Laird NM 《Lifetime data analysis》2006,12(3):249-265

Information from multiple informants is frequently used to assess psychopathology. We consider marginal regression models with multiple informants as discrete predictors and a time to event outcome. We fit these models to data from the Stirling County Study; specifically, the models predict mortality from self report of psychiatric disorders and also predict mortality from physician report of psychiatric disorders. Previously, Horton et al. found little relationship between self and physician reports of psychopathology, but that the relationship of self report of psychopathology with mortality was similar to that of physician report of psychopathology with mortality. Generalized estimating equations (GEE) have been used to fit marginal models with multiple informant covariates; here we develop a maximum likelihood (ML) approach and show how it relates to the GEE approach. In a simple setting using a saturated model, the ML approach can be constructed to provide estimates that match those found using GEE. We extend the ML technique to consider multiple informant predictors with missingness and compare the method to using inverse probability weighted (IPW) GEE. Our simulation study illustrates that IPW GEE loses little efficiency compared with ML in the presence of monotone missingness. Our example data has non-monotone missingness; in this case, ML offers a modest decrease in variance compared with IPW GEE, particularly for estimating covariates in the marginal models. In more general settings, e.g., categorical predictors and piecewise exponential models, the likelihood parameters from the ML technique do not have the same interpretation as the GEE. Thus, the GEE is recommended to fit marginal models for its flexibility, ease of interpretation and comparable efficiency to ML in the presence of missing data. 相似文献

19.

Using Inverse Probability Weighting Estimators to Evaluate Various Propensity Scores When Treatment Switching Exists

Chunhao Tu Woon Yuen Koh 《统计学通讯:模拟与计算》2016,45(6):2182-2190

In this paper, we conduct a Monte Carlo simulation study to evaluate three propensity score (PS) scenarios for estimating an average treatment effect (ATE) in observational studies when treatment switching exists: (a) ignoring treatment switching in subjects (UPS), (b) removing subjects with treatment switching (RPS), and (c) adjusting for treatment switching effect (APS) with two inverse probability weighting estimators, IPW1 and IPW2. We evaluate these six estimators in terms of bias, mean squared error (MSE), empirical standard error (ESE), and coverage probability (CP) under various simulation scenarios. Simulation results show that the IPW2 estimator with RPS has relatively good performance. 相似文献

20.

The weighted least square based estimators with censoring indicators missing at random

Xiayan Li Qihua Wang 《Journal of statistical planning and inference》2012

In this paper, we study linear regression analysis when some of the censoring indicators are missing at random. We define regression calibration estimate, imputation estimate and inverse probability weighted estimate for the regression coefficient vector based on the weighted least squared approach due to Stute (1993), and prove all the estimators are asymptotically normal. A simulation study was conducted to evaluate the finite properties of the proposed estimators, and a real data example is provided to illustrate our methods. 相似文献