首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
This article addresses issues in creating public-use data files in the presence of missing ordinal responses and subsequent statistical analyses of the dataset by users. The authors propose a fully efficient fractional imputation (FI) procedure for ordinal responses with missing observations. The proposed imputation strategy retrieves the missing values through the full conditional distribution of the response given the covariates and results in a single imputed data file that can be analyzed by different data users with different scientific objectives. Two most critical aspects of statistical analyses based on the imputed data set,  validity  and  efficiency, are examined through regression analysis involving the ordinal response and a selected set of covariates. It is shown through both theoretical development and simulation studies that, when the ordinal responses are missing at random, the proposed FI procedure leads to valid and highly efficient inferences as compared to existing methods. Variance estimation using the fractionally imputed data set is also discussed. The Canadian Journal of Statistics 48: 138–151; 2020 © 2019 Statistical Society of Canada  相似文献   

2.
The present article deals with the problem of estimation of parameters in a linear regression model when some data on response variable is missing and the responses are equi-correlated. The ordinary least squares and optimal homogeneous predictors are employed to find the imputed values of missing observations. Their efficiency properties are analyzed using the small disturbances asymptotic theory. The estimation of regression coefficients using these imputed values is also considered and a comparison of estimators is presented.  相似文献   

3.
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed.  相似文献   

4.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

5.
In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence.  相似文献   

6.
Quantitle regression (QR) is a popular approach to estimate functional relations between variables for all portions of a probability distribution. Parameter estimation in QR with missing data is one of the most challenging issues in statistics. Regression quantiles can be substantially biased when observations are subject to missingness. We study several inverse probability weighting (IPW) estimators for parameters in QR when covariates or responses are subject to missing not at random. Maximum likelihood and semiparametric likelihood methods are employed to estimate the respondent probability function. To achieve nice efficiency properties, we develop an empirical likelihood (EL) approach to QR with the auxiliary information from the calibration constraints. The proposed methods are less sensitive to misspecified missing mechanisms. Asymptotic properties of the proposed IPW estimators are shown under general settings. The efficiency gain of EL-based IPW estimator is quantified theoretically. Simulation studies and a data set on the work limitation of injured workers from Canada are used to illustrated our proposed methodologies.  相似文献   

7.
Distribution function estimation plays a significant role of foundation in statistics since the population distribution is always involved in statistical inference and is usually unknown. In this paper, we consider the estimation of the distribution function of a response variable Y with missing responses in the regression problems. It is proved that the augmented inverse probability weighted estimator converges weakly to a zero mean Gaussian process. A augmented inverse probability weighted empirical log-likelihood function is also defined. It is shown that the empirical log-likelihood converges weakly to the square of a Gaussian process with mean zero and variance one. We apply these results to the construction of Gaussian process approximation based confidence bands and empirical likelihood based confidence bands of the distribution function of Y. A simulation is conducted to evaluate the confidence bands.  相似文献   

8.
By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data.  相似文献   

9.
A common approach taken in high‐dimensional regression analysis is sliced inverse regression, which separates the range of the response variable into non‐overlapping regions, called ‘slices’. Asymptotic results are usually shown assuming that the slices are fixed, while in practice, estimators are computed with random slices containing the same number of observations. Based on empirical process theory, we present a unified theoretical framework to study these techniques, and revisit popular inverse regression estimators. Furthermore, we introduce a bootstrap methodology that reproduces the laws of Cramér–von Mises test statistics of interest to model dimension, effects of specified covariates and whether or not a sliced inverse regression estimator is appropriate. Finally, we investigate the accuracy of different bootstrap procedures by means of simulations.  相似文献   

10.
Abstract. In the presence of missing covariates, standard model validation procedures may result in misleading conclusions. By building generalized score statistics on augmented inverse probability weighted complete‐case estimating equations, we develop a new model validation procedure to assess the adequacy of a prescribed analysis model when covariate data are missing at random. The asymptotic distribution and local alternative efficiency for the test are investigated. Under certain conditions, our approach provides not only valid but also asymptotically optimal results. A simulation study for both linear and logistic regression illustrates the applicability and finite sample performance of the methodology. Our method is also employed to analyse a coronary artery disease diagnostic dataset.  相似文献   

11.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

12.
The authors propose two tests, one parametric and the other semiparametric, for testing bias of estimating equations in weighted regression with partially missing covariates when the primary regression model is correctly specified. More generally, the proposed tests may be thought of as a diagnostic tool for the combined package of the primary regression model and the missingness assumptions. The asymptotic null distributions of the two test statistics are derived under the assumption of missingness at random for the partially missing covariates. A small scale simulation study completes the work.  相似文献   

13.
For an estimation with missing data, a crucial step is to determine if the data are missing completely at random (MCAR), in which case a complete‐case analysis would suffice. Most existing tests for MCAR do not provide a method for a subsequent estimation once the MCAR is rejected. In the setting of estimating means, we propose a unified approach for testing MCAR and the subsequent estimation. Upon rejecting MCAR, the same set of weights used for testing can then be used for estimation. The resulting estimators are consistent if the missingness of each response variable depends only on a set of fully observed auxiliary variables and the true outcome regression model is among the user‐specified functions for deriving the weights. The proposed method is based on the calibration idea from survey sampling literature and the empirical likelihood theory.  相似文献   

14.
The problem of missing values problem is common in all branches of statistics and especially in regression analysis. Here we consider estimation of the regression parameters in the presence of missingness in the response. The usual method is to replace the missing value by its predicted value based on the available observations without any correction for the disturbance term. Instead we suggest a method which corrects the usual predictor with a guess of the disturbance term based on the available residuals. Comparison between the two methods shows that the latter leads to better results.  相似文献   

15.
Hea-Jung Kim  Taeyoung Roh 《Statistics》2013,47(5):1082-1111
In regression analysis, a sample selection scheme often applies to the response variable, which results in missing not at random observations on the variable. In this case, a regression analysis using only the selected cases would lead to biased results. This paper proposes a Bayesian methodology to correct this bias based on a semiparametric Bernstein polynomial regression model that incorporates the sample selection scheme into a stochastic monotone trend constraint, variable selection, and robustness against departures from the normality assumption. We present the basic theoretical properties of the proposed model that include its stochastic representation, sample selection bias quantification, and hierarchical model specification to deal with the stochastic monotone trend constraint in the nonparametric component, simple bias corrected estimation, and variable selection for the linear components. We then develop computationally feasible Markov chain Monte Carlo methods for semiparametric Bernstein polynomial functions with stochastically constrained parameter estimation and variable selection procedures. We demonstrate the finite-sample performance of the proposed model compared to existing methods using simulation studies and illustrate its use based on two real data applications.  相似文献   

16.
Abstract

A method is proposed for the estimation of missing data in analysis of covariance models. This is based on obtaining an estimate of the missing observation that minimizes the error sum of squares. Specific derivation of this estimate is carried out for the one-factor analysis of covariance, and numerical examples are given to show the nature of the estimates produced. Parameter estimates of the imputed data are then compared with those of the incomplete data.  相似文献   

17.
This paper introduces a nonparametric approach for testing the equality of two or more survival distributions based on right censored failure times with missing population marks for the censored observations. The standard log-rank test is not applicable here because the population membership information is not available for the right censored individuals. We propose to use the imputed population marks for the censored observations leading to fractional at-risk sets that can be used in a two sample censored data log-rank test. We demonstrate with a simple example that there could be a gain in power by imputing population marks (the proposed method) for the right censored individuals compared to simply removing them (which also would maintain the right size). Performance of the imputed log-rank tests obtained this way is studied through simulation. We also obtain an asymptotic linear representation of our test statistic. Our testing methodology is illustrated using a real data set.  相似文献   

18.
Consistency of propensity score matching estimators hinges on the propensity score's ability to balance the distributions of covariates in the pools of treated and non-treated units. Conventional balance tests merely check for differences in covariates’ means, but cannot account for differences in higher moments. For this reason, this paper proposes balance tests which test for differences in the entire distributions of continuous covariates based on quantile regression (to derive Kolmogorov–Smirnov and Cramer–von-Mises–Smirnov-type test statistics) and resampling methods (for inference). Simulations suggest that these methods are very powerful and capture imbalances related to higher moments when conventional balance tests fail to do so.  相似文献   

19.
Tianqing Liu 《Statistics》2016,50(1):89-113
This paper proposes an empirical likelihood-based weighted (ELW) quantile regression approach for estimating the conditional quantiles when some covariates are missing at random. The proposed ELW estimator is computationally simple and achieves semiparametric efficiency if the probability of missingness is correctly specified. The limiting covariance matrix of the ELW estimator can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. Simulation results show that the ELW method works remarkably well in finite samples. A real data example is used to illustrate the proposed ELW method.  相似文献   

20.
In this paper, we introduce a fresh methodology for imputing missing values by making use of sensible constraints on both a study variable and auxiliary variables that are correlated with the variable of interest. The resultant estimator based on these imputed values is shown to lead to the regression type method of imputation in survey sampling. Furthermore, when the data are hybrid of both that missing at random and missing complexly at random, the resultant estimator is shown to be a consistent estimator that has asymptotic mean squared error equal to that of the linear regression method of imputation. A generalization to any type of method of imputation is possible and has been included at the end.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号