首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

2.
We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   

3.
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.  相似文献   

4.
We regard the simple linear calibration problem where only the response y of the regression line y = β0 + β1 t is observed with errors. The experimental conditions t are observed without error. For the errors of the observations y we assume that there may be some gross errors providing outlying observations. This situation can be modeled by a conditionally contaminated regression model. In this model the classical calibration estimator based on the least squares estimator has an unbounded asymptotic bias. Therefore we introduce calibration estimators based on robust one-step-M-estimators which have a bounded asymptotic bias. For this class of estimators we discuss two problems: The optimal estimators and their corresponding optimal designs. We derive the locally optimal solutions and show that the maximin efficient designs for non-robust estimation and robust estimation coincide.  相似文献   

5.
In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments.  相似文献   

6.
The author considers time‐to‐event data from case‐cohort designs. As existing methods are either inefficient or based on restrictive assumptions concerning the censoring mechanism, he proposes a semi‐parametrically efficient estimator under the usual assumptions for Cox regression models. The estimator in question is obtained by a one‐step Newton‐Raphson approximation that solves the efficient score equations with initial value obtained from an existing method. The author proves that the estimator is consistent, asymptotically efficient and normally distributed in the limit. He also resorts to simulations to show that the proposed estimator performs well in finite samples and that it considerably improves the efficiency of existing pseudo‐likelihood estimators when a correlate of the missing covariate is available. Although he focuses on the situation where covariates are discrete, the author also explores how the method can be applied to models with continuous covariates.  相似文献   

7.
ABSTRACT

In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method.  相似文献   

8.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

9.
In this article, an improved method of computing tolerance factors for constructing tolerance regions in a multivariate linear regression model is proposed. The method is based on a chi-square approximation to the distribution of a linear function of noncentral chi-square variables and simulation. The merits of the proposed approach and the usual simulation method considered in Lee and Mathew (2004 Lee , Y. , Mathew , T. ( 2004 ). Tolerance regions in multivariate linear regression . Journal of Statistical Planning Inference 126 : 253271 . [Google Scholar]) are evaluated using Monte Carlo simulation. The study indicates that the proposed approach is stable and accurate even for small samples, and better than available methods. For constructing two-sided tolerance intervals in multiple linear regression, coverage level adjusted one-sided tolerance factors are shown to be better than available approximate tolerance factors. The results based on the coverage level adjusted one-sided tolerance factors are as good as the ones based on the exact two-sided tolerance factors in many cases.  相似文献   

10.
In fuzzy regression discontinuity (FRD) designs, the treatment effect is identified through a discontinuity in the conditional probability of treatment assignment. We show that when identification is weak (i.e., when the discontinuity is of a small magnitude), the usual t-test based on the FRD estimator and its standard error suffers from asymptotic size distortions as in a standard instrumental variables setting. This problem can be especially severe in the FRD setting since only observations close to the discontinuity are useful for estimating the treatment effect. To eliminate those size distortions, we propose a modified t-statistic that uses a null-restricted version of the standard error of the FRD estimator. Simple and asymptotically valid confidence sets for the treatment effect can be also constructed using this null-restricted standard error. An extension to testing for constancy of the regression discontinuity effect across covariates is also discussed. Supplementary materials for this article are available online.  相似文献   

11.
Linear regression with compositional explanatory variables   总被引:1,自引:0,他引:1  
Compositional explanatory variables should not be directly used in a linear regression model because any inference statistic can become misleading. While various approaches for this problem were proposed, here an approach based on the isometric logratio (ilr) transformation is used. It turns out that the resulting model is easy to handle, and that parameter estimation can be done in like in usual linear regression. Moreover, it is possible to use the ilr variables for inference statistics in order to obtain an appropriate interpretation of the model.  相似文献   

12.
研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的.  相似文献   

13.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

14.
We consider estimating the mode of a response given an error‐prone covariate. It is shown that ignoring measurement error typically leads to inconsistent inference for the conditional mode of the response given the true covariate, as well as misleading inference for regression coefficients in the conditional mode model. To account for measurement error, we first employ the Monte Carlo corrected score method (Novick & Stefanski, 2002) to obtain an unbiased score function based on which the regression coefficients can be estimated consistently. To relax the normality assumption on measurement error this method requires, we propose another method where deconvoluting kernels are used to construct an objective function that is maximized to obtain consistent estimators of the regression coefficients. Besides rigorous investigation on asymptotic properties of the new estimators, we study their finite sample performance via extensive simulation experiments, and find that the proposed methods substantially outperform a naive inference method that ignores measurement error. The Canadian Journal of Statistics 47: 262–280; 2019 © 2019 Statistical Society of Canada  相似文献   

15.
The class of symmetric linear regression models has the normal linear regression model as a special case and includes several models that assume that the errors follow a symmetric distribution with longer-than-normal tails. An important member of this class is the t linear regression model, which is commonly used as an alternative to the usual normal regression model when the data contain extreme or outlying observations. In this article, we develop second-order asymptotic theory for score tests in this class of models. We obtain Bartlett-corrected score statistics for testing hypotheses on the regression and the dispersion parameters. The corrected statistics have chi-squared distributions with errors of order O(n ?3/2), n being the sample size. The corrections represent an improvement over the corresponding original Rao's score statistics, which are chi-squared distributed up to errors of order O(n ?1). Simulation results show that the corrected score tests perform much better than their uncorrected counterparts in samples of small or moderate size.  相似文献   

16.
We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

17.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

18.
By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data.  相似文献   

19.
Conditional power calculations are frequently used to guide the decision whether or not to stop a trial for futility or to modify planned sample size. These ignore the information in short‐term endpoints and baseline covariates, and thereby do not make fully efficient use of the information in the data. We therefore propose an interim decision procedure based on the conditional power approach which exploits the information contained in baseline covariates and short‐term endpoints. We will realize this by considering the estimation of the treatment effect at the interim analysis as a missing data problem. This problem is addressed by employing specific prediction models for the long‐term endpoint which enable the incorporation of baseline covariates and multiple short‐term endpoints. We show that the proposed procedure leads to an efficiency gain and a reduced sample size, without compromising the Type I error rate of the procedure, even when the adopted prediction models are misspecified. In particular, implementing our proposal in the conditional power approach enables earlier decisions relative to standard approaches, whilst controlling the probability of an incorrect decision. This time gain results in a lower expected number of recruited patients in case of stopping for futility, such that fewer patients receive the futile regimen. We explain how these methods can be used in adaptive designs with unblinded sample size re‐assessment based on the inverse normal P‐value combination method to control Type I error. We support the proposal by Monte Carlo simulations based on data from a real clinical trial.  相似文献   

20.
Consider the problem of pointwise estimation of f in a multivariate isotonic regression model Z=f(X1,…,Xd)+ϵ, where Z is the response variable, f is an unknown nonparametric regression function, which is isotonic with respect to each component, and ϵ is the error term. In this article, we investigate the behavior of the least squares estimator of f. We generalize the greatest convex minorant characterization of isotonic regression estimator for the multivariate case and use it to establish the asymptotic distribution of properly normalized version of the estimator. Moreover, we test whether the multivariate isotonic regression function at a fixed point is larger (or smaller) than a specified value or not based on this estimator, and the consistency of the test is established. The practicability of the estimator and the test are shown on simulated and real data as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号