期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Robust Estimation for Linear Models with Missing Response at Random

《Scandinavian Journal of Statistics》2018,45(2):366-381

Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an IC_Q‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets. 相似文献

2.

基于数据缺失率和缺失模式的多重插补误差研究

彭海艳李意芝孟利军《统计与决策》2022,(1)

文章通过多重插补方法对不同缺失率和缺失模式的多变量缺失样本进行插补,研究了多重插补误差与缺失率和缺失模式的依赖关系。结果表明,当缺失率为0~15%时,多重插补误差与缺失率呈线性关系;当缺失率大于15%时,两者呈偏离线性关系。多重插补误差与缺失模式的方差均值比呈正相关性,当方差均值比越大时,误差也越大。相似文献

3.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

4.

PARAMETRIC FRACTIONAL IMPUTATION FOR NON‐IGNORABLE CATEGORICAL MISSING DATA WITH FOLLOW‐UP

Ji Young Kim 《Australian & New Zealand Journal of Statistics》2012,54(2):239-250

Incomplete data subject to non‐ignorable non‐response are often encountered in practice and have a non‐identifiability problem. A follow‐up sample is randomly selected from the set of non‐respondents to avoid the non‐identifiability problem and get complete responses. Glynn, Laird, & Rubin analyzed non‐ignorable missing data with a follow‐up sample under a pattern mixture model. In this article, maximum likelihood estimation of parameters of the categorical missing data is considered with a follow‐up sample under a selection model. To estimate the parameters with non‐ignorable missing data, the EM algorithm with weighting, proposed by Ibrahim, is used. That is, in the E‐step, the weighted mean is calculated using the fractional weights for imputed data. Variances are estimated using the approximated jacknife method. Simulation results are presented to compare the proposed method with previously presented methods. 相似文献

5.

Implementation of pattern‐mixture models in randomized clinical trials

下载免费PDF全文

P. Bunouf G. Molenberghs 《Pharmaceutical statistics》2016,15(6):494-506

相似文献

6.

Hybrid pairwise‐likelihood estimation methods for incomplete longitudinal binary data

下载免费PDF全文

Haocheng Li Yukun Zhang 《Australian & New Zealand Journal of Statistics》2018,60(3):394-404

In longitudinal data, missing observations occur commonly with incomplete responses and covariates. Missing data can have a ‘missing not at random’ mechanism, a non‐monotone missing pattern, and moreover response and covariates can be missing not simultaneously. To avoid complexities in both modelling and computation, a two‐stage estimation method and a pairwise‐likelihood method are proposed. The two‐stage estimation method enjoys simplicities in computation, but incurs more severe efficiency loss. On the other hand, the pairwise approach leads to estimators with better efficiency, but can be cumbersome in computation. In this paper, we develop a compromise method using a hybrid pairwise‐likelihood framework. Our proposed approach has better efficiency than the two‐stage method, but its computational cost is still reasonable compared to the pairwise approach. The performance of the methods is evaluated empirically by means of simulation studies. Our methods are used to analyse longitudinal data obtained from the National Population Health Study. 相似文献

7.

缺失偏态数据下线性回归模型的统计推断 总被引：1，自引：2，他引：1

吴刘仓张家茂邱贻涛《统计与信息论坛》2013,28(9):22-26

研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的. 相似文献

8.

Asymptotic Inference with Incomplete Data

Keiji Takai Yutaka Kano 《统计学通讯:理论与方法》2013,42(17):3174-3190

This article discusses asymptotic theory for the maximum likelihood estimator based on incomplete data. Although much literature has implicitly assumed the basic properties of the estimator, such as consistency and asymptotic normality, it is hard to find their precise and comprehensive proofs. In this article, we first show that under MAR an estimator based on the likelihood function ignoring the missing-data mechanism is strongly consistent. The estimator is then shown to be asymptotically normal. When the data are NMAR and when the data are MAR without parameter distinctness, the consistency and the asymptotic normality are shown. Several examples are provided. 相似文献

9.

Empirical Likelihood Confidence Intervals for Response Mean with Data Missing at Random

LIUGEN XUE 《Scandinavian Journal of Statistics》2009,36(4):671-685

Abstract. A kernel regression imputation method for missing response data is developed. A class of bias-corrected empirical log-likelihood ratios for the response mean is defined. It is shown that any member of our class of ratios is asymptotically chi-squared, and the corresponding empirical likelihood confidence interval for the response mean is constructed. Our ratios share some of the desired features of the existing methods: they are self-scale invariant and no plug-in estimators for the adjustment factor and asymptotic variance are needed; when estimating the non-parametric function in the model, undersmoothing to ensure root- n consistency of the estimator for the parameter is avoided. Since the range of bandwidths contains the optimal bandwidth for estimating the regression function, the existing data-driven algorithm is valid for selecting an optimal bandwidth. We also study the normal approximation-based method. A simulation study is undertaken to compare the empirical likelihood with the normal approximation method in terms of coverage accuracies and average lengths of confidence intervals. 相似文献

10.

Short notes on maximum likelihood inference for control‐based pattern‐mixture models

Yongqiang Tang 《Pharmaceutical statistics》2015,14(5):395-399

A likelihood‐based analytical approach has been proposed for the control‐based pattern‐mixture model and its extension. In this note, we derive equivalent but simpler analytical expressions for the treatment effect and its variance for these control‐based pattern mixture models. Our formulae are easier to use and interpret. An application of our formulae to an antidepressant trial is provided, in which the likelihood‐based analysis is compared with the multiple imputation approach. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

11.

A Sample Selection Model with Skew‐normal Distribution

下载免费PDF全文

Emmanuel O. Ogundimu Jane L. Hutton 《Scandinavian Journal of Statistics》2016,43(1):172-190

Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable. 相似文献

12.

Empirical Likelihood Inference for Longitudinal Data with Missing Response Variables and Error-Prone Covariates

Tao Zhang 《统计学通讯:理论与方法》2013,42(18):3230-3244

We consider statistical inference for longitudinal partially linear models when the response variable is sometimes missing with missingness probability depending on the covariate that is measured with error. The block empirical likelihood procedure is used to estimate the regression coefficients and residual adjusted block empirical likelihood is employed for the baseline function. This leads us to prove a nonparametric version of Wilk's theorem. Compared with methods based on normal approximations, our proposed method does not require a consistent estimators for the asymptotic variance and bias. An application to a longitudinal study is used to illustrate the procedure developed here. A simulation study is also reported. 相似文献

13.

Estimation of Stratified Mark‐Specific Proportional Hazards Models with Missing Marks

YANQING SUN PETER B. GILBERT 《Scandinavian Journal of Statistics》2012,39(1):34-52

Abstract. An objective of randomized placebo‐controlled preventive HIV vaccine efficacy trials is to assess the relationship between the vaccine effect to prevent infection and the genetic distance of the exposing HIV to the HIV strain represented in the vaccine construct. Motivated by this objective, recently a mark‐specific proportional hazards (PH) model with a continuum of competing risks has been studied, where the genetic distance of the transmitting strain is the continuous ‘mark’ defined and observable only in failures. A high percentage of genetic marks of interest may be missing for a variety of reasons, predominantly because rapid evolution of HIV sequences after transmission before a blood sample is drawn from which HIV sequences are measured. This research investigates the stratified mark‐specific PH model with missing marks where the baseline functions may vary with strata. We develop two consistent estimation approaches, the first based on the inverse probability weighted complete‐case (IPW) technique, and the second based on augmenting the IPW estimator by incorporating auxiliary information predictive of the mark. We investigate the asymptotic properties and finite‐sample performance of the two estimators, and show that the augmented IPW estimator, which satisfies a double robustness property, is more efficient. 相似文献

14.

On the asymptotic non‐equivalence of efficient‐GMM and MEL estimators in models with missing data

Xuerong Chen Yan Chen Alan T.K. Wan Yong Zhou 《Scandinavian Journal of Statistics》2019,46(2):361-388

The generalized method of moments (GMM) and empirical likelihood (EL) are popular methods for combining sample and auxiliary information. These methods are used in very diverse fields of research, where competing theories often suggest variables satisfying different moment conditions. Results in the literature have shown that the efficient‐GMM (GMM_E) and maximum empirical likelihood (MEL) estimators have the same asymptotic distribution to order n^?1/2 and that both estimators are asymptotically semiparametric efficient. In this paper, we demonstrate that when data are missing at random from the sample, the utilization of some well‐known missing‐data handling approaches proposed in the literature can yield GMM_E and MEL estimators with nonidentical properties; in particular, it is shown that the GMM_E estimator is semiparametric efficient under all the missing‐data handling approaches considered but that the MEL estimator is not always efficient. A thorough examination of the reason for the nonequivalence of the two estimators is presented. A particularly strong feature of our analysis is that we do not assume smoothness in the underlying moment conditions. Our results are thus relevant to situations involving nonsmooth estimating functions, including quantile and rank regressions, robust estimation, the estimation of receiver operating characteristic (ROC) curves, and so on. 相似文献

15.

Non‐Parametric Change‐Point Tests for Long‐Range Dependent Data

HEROLD DEHLING AENEAS ROOCH MURAD S. TAQQU 《Scandinavian Journal of Statistics》2013,40(1):153-173

Abstract. We propose a non‐parametric change‐point test for long‐range dependent data, which is based on the Wilcoxon two‐sample test. We derive the asymptotic distribution of the test statistic under the null hypothesis that no change occurred. In a simulation study, we compare the power of our test with the power of a test which is based on differences of means. The results of the simulation study show that in the case of Gaussian data, our test has only slightly smaller power minus.3pt than the ‘difference‐of‐means’ test. For heavy‐tailed data, our test outperforms the ‘difference‐of‐means’ test. 相似文献

16.

An Extended Single‐index Model with Missing Response at Random

Qihua Wang Tao Zhang Wolfgang Karl Härdle 《Scandinavian Journal of Statistics》2016,43(4):1140-1152

An extended single‐index model is considered when responses are missing at random. A three‐step estimation procedure is developed to define an estimator for the single‐index parameter vector by a joint estimating equation. The proposed estimator is shown to be asymptotically normal. An algorithm for computing this estimator is proposed. This algorithm only involves one‐dimensional nonparametric smoothers, thereby avoiding the data sparsity problem caused by high model dimensionality. Some simulation studies are conducted to investigate the finite sample performances of the proposed estimators. 相似文献

17.

Missing data in clinical trials: control‐based mean imputation and sensitivity analysis

下载免费PDF全文

Devan V. Mehrotra Fang Liu Thomas Permutt 《Pharmaceutical statistics》2017,16(5):378-392

In some randomized (drug versus placebo) clinical trials, the estimand of interest is the between‐treatment difference in population means of a clinical endpoint that is free from the confounding effects of “rescue” medication (e.g., HbA1c change from baseline at 24 weeks that would be observed without rescue medication regardless of whether or when the assigned treatment was discontinued). In such settings, a missing data problem arises if some patients prematurely discontinue from the trial or initiate rescue medication while in the trial, the latter necessitating the discarding of post‐rescue data. We caution that the commonly used mixed‐effects model repeated measures analysis with the embedded missing at random assumption can deliver an exaggerated estimate of the aforementioned estimand of interest. This happens, in part, due to implicit imputation of an overly optimistic mean for “dropouts” (i.e., patients with missing endpoint data of interest) in the drug arm. We propose an alternative approach in which the missing mean for the drug arm dropouts is explicitly replaced with either the estimated mean of the entire endpoint distribution under placebo (primary analysis) or a sequence of increasingly more conservative means within a tipping point framework (sensitivity analysis); patient‐level imputation is not required. A supplemental “dropout = failure” analysis is considered in which a common poor outcome is imputed for all dropouts followed by a between‐treatment comparison using quantile regression. All analyses address the same estimand and can adjust for baseline covariates. Three examples and simulation results are used to support our recommendations. 相似文献

18.

缺失偏态数据下联合位置与尺度模型的统计推断

李玲雪吴刘仓詹金龙《统计与信息论坛》2014,(3):15-21

为了研究缺失偏态数据下的联合位置与尺度模型,基于分布自身的特点,提出了一种适合缺失偏态数据下联合建模的插补方法———修正随机回归插补方法,该方法对缺失数据下模型偏度参数的调整十分显著。通过随机模拟和实例研究,并与回归插补和随机回归插补方法进行比较,结果表明,所提出的修正随机回归插补方法是有用和有效的。相似文献

19.

A Model Validation Procedure when Covariate Data are Missing at Random

LEI JIN SUOJIN WANG 《Scandinavian Journal of Statistics》2010,37(3):403-421

Abstract. In the presence of missing covariates, standard model validation procedures may result in misleading conclusions. By building generalized score statistics on augmented inverse probability weighted complete‐case estimating equations, we develop a new model validation procedure to assess the adequacy of a prescribed analysis model when covariate data are missing at random. The asymptotic distribution and local alternative efficiency for the test are investigated. Under certain conditions, our approach provides not only valid but also asymptotically optimal results. A simulation study for both linear and logistic regression illustrates the applicability and finite sample performance of the methodology. Our method is also employed to analyse a coronary artery disease diagnostic dataset. 相似文献

20.

Efficient estimation for case‐cohort studies

Bin Nan 《Revue canadienne de statistique》2004,32(4):403-419

The author considers time‐to‐event data from case‐cohort designs. As existing methods are either inefficient or based on restrictive assumptions concerning the censoring mechanism, he proposes a semi‐parametrically efficient estimator under the usual assumptions for Cox regression models. The estimator in question is obtained by a one‐step Newton‐Raphson approximation that solves the efficient score equations with initial value obtained from an existing method. The author proves that the estimator is consistent, asymptotically efficient and normally distributed in the limit. He also resorts to simulations to show that the proposed estimator performs well in finite samples and that it considerably improves the efficiency of existing pseudo‐likelihood estimators when a correlate of the missing covariate is available. Although he focuses on the situation where covariates are discrete, the author also explores how the method can be applied to models with continuous covariates. 相似文献