期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parametric fractional imputation for nonignorable missing data

Ji Young Kim Jae Kwang Kim 《Journal of the Korean Statistical Society》2012,41(3):291-303

Parameter estimation with missing data is a frequently encountered problem in statistics. Imputation is often used to facilitate the parameter estimation by simply applying the complete-sample estimators to the imputed dataset.In this article, we consider the problem of parameter estimation with nonignorable missing data using the approach of parametric fractional imputation proposed by Kim (2011). Using the fractional weights, the E-step of the EM algorithm can be approximated by the weighted mean of the imputed data likelihood where the fractional weights are computed from the current value of the parameter estimates. Calibration fractional imputation is also considered as a way for improving the Monte Carlo approximation in the fractional imputation. Variance estimation is also discussed. Results from two simulation studies are presented to compare the proposed method with the existing methods. A real data example from the Korea Labor and Income Panel Survey (KLIPS) is also presented. 相似文献

2.

Missing data and small area estimation in the UK Labour Force Survey

Nicholas T. Longford 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(2):341-373

Summary. We apply multivariate shrinkage to estimate local area rates of unemployment and economic inactivity by using UK Labour Force Survey data. The method exploits the similarity of the rates of claiming unemployment benefit and the unemployment rates as defined by the International Labour Organisation. This is done without any distributional assumptions, merely relying on the high correlation of the two rates. The estimation is integrated with a multiple-imputation procedure for missing employment status of subjects in the database (item non-response). The hot deck method that is used in the imputations is adapted to reflect the uncertainty in the model for non-response. The method is motivated as a development (improvement) of the current operational procedure in which the imputed value is a non-stochastic function of the data. An extension of the procedure to subjects who are absent from the database (unit non-response) is proposed. 相似文献

3.

Variable selection for high-dimensional generalized linear model with block-missing data

Yifan He Yang Feng Xinyuan Song 《Scandinavian Journal of Statistics》2023,50(3):1279-1297

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method. 相似文献

4.

Augmented inverse probability weighted fractional imputation in quantile regression

Hao Cheng 《Pharmaceutical statistics》2021,20(1):25-38

By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data. 相似文献

5.

COMBINING HOUSEHOLD SURVEYS USING MASS IMPUTATION TO ESTIMATE POPULATION TOTALS

James Chipperfield Julia Chessman Russell Lim 《Australian & New Zealand Journal of Statistics》2012,54(2):223-238

Pressure is often placed on statistical analysts to improve the accuracy of their population estimates. In response to this pressure, analysts have long exploited the potential to combine surveys in various ways. This paper develops a framework for combining surveys when data items from one of the surveys is mass imputed. The estimates from the surveys are combined using a composite estimator (CE). The CE accounts for the variability due to the imputation model and the surveys’ sampling schemes. Diagnostics for the validity of the imputation model are also discussed. We describe an application of combining the Australian Labour Force Survey and the National Aboriginal and Torres Strait Islander Health Survey to estimate employment characteristics about the Indigenous population. The findings suggest that combining these surveys is beneficial. 相似文献

6.

Imputation of Household Survey Data Using Linear Mixed Models

下载免费PDF全文

Luise Patricia Lago Robert Graham Clark 《Australian & New Zealand Journal of Statistics》2015,57(2):169-187

Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra‐household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness. 相似文献

7.

Modeling Nonignorable Nonresponse in Categorical Panel Data With an Example in Estimating Gross Labor-Force Flows

Elizabeth A. Stasny 《商业与经济统计学杂志》2013,31(2):207-219

Many large-scale sample surveys use panel designs under which sampled individuals are interviewed several times before being dropped from the sample. The longitudinal data bases available from such surveys could be used to provide estimates of gross change over time. One problem in using these data to estimate gross change is how to handle the period-to-period nonresponse. This nonresponse is typically nonrandom and, furthermore, may be nonignorable in that it cannot be accounted for by other observed quantities in the data. Under the models proposed in this article, which are appropriate for the analysis of categorical data, the probability of nonresponse may be taken to be a function of the missing variable of interest. The proposed models are fit using maximum likelihood estimation. As an example, the method is applied to the problem of estimating gross flows in labor-force participation using data from the Current Population Survey and the Canadian Labour Force Survey. 相似文献

8.

A multiple imputation method for incomplete correlated ordinal data using multivariate probit models

Xiao Zhang Quanlin Li Karen Cropsey Xiaowei Yang Kui Zhang Thomas Belin 《统计学通讯:模拟与计算》2017,46(3):2360-2375

The multiple imputation technique has proven to be a useful tool in missing data analysis. We propose a Markov chain Monte Carlo method to conduct multiple imputation for incomplete correlated ordinal data using the multivariate probit model. We conduct a thorough simulation study to compare the performance of our proposed method with two available imputation methods – multivariate normal-based and chain equation methods for various missing data scenarios. For illustration, we present an application using the data from the smoking cessation treatment study for low-income community corrections smokers. 相似文献

9.

A comparison study of nonparametric imputation methods

Jianhui Ning Philip E. Cheng 《Statistics and Computing》2012,22(1):273-285

Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data. 相似文献

10.

Multiple imputation of censored survival data in the presence of missing covariates using restricted mean survival time

Gurprit Grover 《Journal of applied statistics》2015,42(4):817-827

Missing covariates data with censored outcomes put a challenge in the analysis of clinical data especially in small sample settings. Multiple imputation (MI) techniques are popularly used to impute missing covariates and the data are then analyzed through methods that can handle censoring. However, techniques based on MI are available to impute censored data also but they are not much in practice. In the present study, we applied a method based on multiple imputation by chained equations to impute missing values of covariates and also to impute censored outcomes using restricted survival time in small sample settings. The complete data were then analyzed using linear regression models. Simulation studies and a real example of CHD data show that the present method produced better estimates and lower standard errors when applied on the data having missing covariate values and censored outcomes than the analysis of the data having censored outcome but excluding cases with missing covariates or the analysis when cases with missing covariate values and censored outcomes were excluded from the data (complete case analysis). 相似文献

11.

Application of an imputation method for variance estimation under pseudo-likelihood when missing data are NMAR

Amy M. Kwon 《统计学通讯:理论与方法》2017,46(14):6959-6966

When data are outcome-dependent non response, pseudo-likelihood yields consistent regression coefficients without specifying the missing data mechanism. However, it is onerous to derive parameter estimators including their standard errors from the regression coefficients under pseudo-likelihood (PL). The present study applies an imputation method to compute the asymptotic standard errors of parameter estimators. The proposed method is simpler than Delta method and it showed similar effect size of the standard errors to bootstrapping in simulation and application studies. 相似文献

12.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

13.

Multiple imputation methods for recurrent event data with missing event category

Douglas E. Schaubel Jianwen Cai 《Revue canadienne de statistique》2006,34(4):677-692

Frequently in clinical and epidemiologic studies, the event of interest is recurrent (i.e., can occur more than once per subject). When the events are not of the same type, an analysis which accounts for the fact that events fall into different categories will often be more informative. Often, however, although event times may always be known, information through which events are categorized may potentially be missing. Complete‐case methods (whose application may require, for example, that events be censored when their category cannot be determined) are valid only when event categories are missing completely at random. This assumption is rather restrictive. The authors propose two multiple imputation methods for analyzing multiple‐category recurrent event data under the proportional means/rates model. The use of a proper or improper imputation technique distinguishes the two approaches. Both methods lead to consistent estimation of regression parameters even when the missingness of event categories depends on covariates. The authors derive the asymptotic properties of the estimators and examine their behaviour in finite samples through simulation. They illustrate their approach using data from an international study on dialysis. 相似文献

14.

Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data

《Journal of Statistical Computation and Simulation》2012,82(17):3498-3511

Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. 相似文献

15.

Multiple imputation for ordinal longitudinal data with monotone missing data patterns

A.Y. Kombo H. Mwambi G. Molenberghs 《Journal of applied statistics》2017,44(2):270-287

Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here. 相似文献

16.

Nonparametric curve estimation with missing data: A general empirical process approach

Majid Mojirsheibani 《Journal of statistical planning and inference》2007

A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions. 相似文献

17.

Inference methods for saturated models in longitudinal clinical trials with incomplete binary data

Song JX 《Pharmaceutical statistics》2006,5(4):295-304

In the longitudinal studies with binary response, it is often of interest to estimate the percentage of positive responses at each time point and the percentage of having at least one positive response by each time point. When missing data exist, the conventional method based on observed percentages could result in erroneous estimates. This study demonstrates two methods of using expectation-maximization (EM) and data augmentation (DA) algorithms in the estimation of the marginal and cumulative probabilities for incomplete longitudinal binary response data. Both methods provide unbiased estimates when the missingness mechanism is missing at random (MAR) assumption. Sensitivity analyses have been performed for cases when the MAR assumption is in question. 相似文献

18.

Comparison of alternative imputation methods for ordinal data

Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330

In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献

19.

Conceptual, computational and inferential benefits of the missing data perspective in applied and theoretical statistical problems

Donald B. Rubin 《AStA Advances in Statistical Analysis》2006,90(4):501-513

This article advocates the following perspective: When confronting a scientific problem, the field of statistics enters by viewing the problem as one where the scientific answer could be calculated if some missing data, hypothetical or real, were available. Thus, statistical effort should be devoted to three steps:

formulate the missing data that would allow this calculation,
stochastically fill in these missing data, and
do the calculations as if the filled-in data were available.

This presentation discusses: conceptual benefits, such as for causal inference using potential outcomes; computational benefits, such as afforded by using the EM algorithm and related data augmentation methods based on MCMC; and inferential benefits, such as valid interval estimation and assessment of assumptions based on multiple imputation. 相似文献

20.

Nonparametric conditional mean imputation

《Journal of statistical planning and inference》2001,99(2):129-150

Imputation is a much used method for handling missing data. It is appealing as it separates the missing data part of the analysis, which is handled by imputation, and the estimation part, which is handled by complete data methods. Most imputation methods, however, either rely on strict parametric assumptions or are rather ad hoc in which case they often only work approximately under even stricter assumptions. In this paper a non-parametric imputation method is proposed. Since it is non-parametric it works under quite general assumptions. In particular, a model for the complete data is not required in the imputation step, and the complete data method used after the imputation may be a general estimating equation for estimating a finite-dimensional parameter. Large sample results for the resulting estimator are given. 相似文献