首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Item non‐response in surveys occurs when some, but not all, variables are missing. Unadjusted estimators tend to exhibit some bias, called the non‐response bias, if the respondents differ from the non‐respondents with respect to the study variables. In this paper, we focus on item non‐response, which is usually treated by some form of single imputation. We examine the properties of doubly robust imputation procedures, which are those that lead to an estimator that remains consistent if either the outcome variable or the non‐response mechanism is adequately modelled. We establish the double robustness property of the imputed estimator of the finite population distribution function under random hot‐deck imputation within classes. We also discuss the links between our approach and that of Chambers and Dunstan. The results of a simulation study support our findings.  相似文献   

2.
This article addresses issues in creating public-use data files in the presence of missing ordinal responses and subsequent statistical analyses of the dataset by users. The authors propose a fully efficient fractional imputation (FI) procedure for ordinal responses with missing observations. The proposed imputation strategy retrieves the missing values through the full conditional distribution of the response given the covariates and results in a single imputed data file that can be analyzed by different data users with different scientific objectives. Two most critical aspects of statistical analyses based on the imputed data set,  validity  and  efficiency, are examined through regression analysis involving the ordinal response and a selected set of covariates. It is shown through both theoretical development and simulation studies that, when the ordinal responses are missing at random, the proposed FI procedure leads to valid and highly efficient inferences as compared to existing methods. Variance estimation using the fractionally imputed data set is also discussed. The Canadian Journal of Statistics 48: 138–151; 2020 © 2019 Statistical Society of Canada  相似文献   

3.
It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated.  相似文献   

4.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

5.
The present investigation addresses the problem of estimating a finite population mean in two-phase cluster sampling in presence of random non response situations. Utilizing information on an auxiliary variable, regression type estimators has been proposed. Effective imputation techniques have been suggested to deal with the random non response situations. The properties of the proposed estimation strategies have been studied for different cases of random non response situations in practical surveys. The superiority of the suggested methodology over the natural sample mean estimator of population mean has been established through empirical studies carried over the data sets of natural population and artificially generated population.  相似文献   

6.
Imputation is often used in surveys to treat item nonresponse. It is well known that treating the imputed values as observed values may lead to substantial underestimation of the variance of the point estimators. To overcome the problem, a number of variance estimation methods have been proposed in the literature, including resampling methods such as the jackknife and the bootstrap. In this paper, we consider the problem of doubly robust inference in the presence of imputed survey data. In the doubly robust literature, point estimation has been the main focus. In this paper, using the reverse framework for variance estimation, we derive doubly robust linearization variance estimators in the case of deterministic and random regression imputation within imputation classes. Also, we study the properties of several jackknife variance estimators under both negligible and nonnegligible sampling fractions. A limited simulation study investigates the performance of various variance estimators in terms of relative bias and relative stability. Finally, the asymptotic normality of imputed estimators is established for stratified multistage designs under both deterministic and random regression imputation. The Canadian Journal of Statistics 40: 259–281; 2012 © 2012 Statistical Society of Canada  相似文献   

7.
Non‐likelihood‐based methods for repeated measures analysis of binary data in clinical trials can result in biased estimates of treatment effects and associated standard errors when the dropout process is not completely at random. We tested the utility of a multiple imputation approach in reducing these biases. Simulations were used to compare performance of multiple imputation with generalized estimating equations and restricted pseudo‐likelihood in five representative clinical trial profiles for estimating (a) overall treatment effects and (b) treatment differences at the last scheduled visit. In clinical trials with moderate to high (40–60%) dropout rates with dropouts missing at random, multiple imputation led to less biased and more precise estimates of treatment differences for binary outcomes based on underlying continuous scores. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

8.
A version of the nonparametric bootstrap, which resamples the entire subjects from original data, called the case bootstrap, has been increasingly used for estimating uncertainty of parameters in mixed‐effects models. It is usually applied to obtain more robust estimates of the parameters and more realistic confidence intervals (CIs). Alternative bootstrap methods, such as residual bootstrap and parametric bootstrap that resample both random effects and residuals, have been proposed to better take into account the hierarchical structure of multi‐level and longitudinal data. However, few studies have been performed to compare these different approaches. In this study, we used simulation to evaluate bootstrap methods proposed for linear mixed‐effect models. We also compared the results obtained by maximum likelihood (ML) and restricted maximum likelihood (REML). Our simulation studies evidenced the good performance of the case bootstrap as well as the bootstraps of both random effects and residuals. On the other hand, the bootstrap methods that resample only the residuals and the bootstraps combining case and residuals performed poorly. REML and ML provided similar bootstrap estimates of uncertainty, but there was slightly more bias and poorer coverage rate for variance parameters with ML in the sparse design. We applied the proposed methods to a real dataset from a study investigating the natural evolution of Parkinson's disease and were able to confirm that the methods provide plausible estimates of uncertainty. Given that most real‐life datasets tend to exhibit heterogeneity in sampling schedules, the residual bootstraps would be expected to perform better than the case bootstrap. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Resampling methods are a common measure to estimate the variance of a statistic of interest when data consist of nonresponse and imputation is used as compensation. Applying resampling methods usually means that subsamples are drawn from the original sample and that variance estimates are computed based on point estimators of several subsamples. However, newer resampling methods such as the rescaling bootstrap of Chipperfield and Preston [Efficient bootstrap for business surveys. Surv Methodol. 2007;33:167–172] include all elements of the original sample in the computation of its point estimator. Thus, procedures to consider imputation in resampling methods cannot be applied in the ordinary way. For such methods, modifications are necessary. This paper presents an approach applying newer resampling methods for imputed data. The Monte Carlo simulation study conducted in the paper shows that the proposed approach leads to reliable variance estimates in contrast to other modifications.  相似文献   

10.
In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
The analysis of clinical trials aiming to show symptomatic benefits is often complicated by the ethical requirement for rescue medication when the disease state of patients worsens. In type 2 diabetes trials, patients receive glucose‐lowering rescue medications continuously for the remaining trial duration, if one of several markers of glycemic control exceeds pre‐specified thresholds. This may mask differences in glycemic values between treatment groups, because it will occur more frequently in less effective treatment groups. Traditionally, the last pre‐rescue medication value was carried forward and analyzed as the end‐of‐trial value. The deficits of such simplistic single imputation approaches are increasingly recognized by regulatory authorities and trialists. We discuss alternative approaches and evaluate them through a simulation study. When the estimand of interest is the effect attributable to the treatments initially assigned at randomization, then our recommendation for estimation and hypothesis testing is to treat data after meeting rescue criteria as deterministically ‘missing’ at random, because initiation of rescue medication is determined by observed in‐trial values. An appropriate imputation of values after meeting rescue criteria is then possible either directly through multiple imputation or implicitly with a repeated measures model. Crucially, one needs to jointly impute or model all markers of glycemic control that can lead to the initiation of rescue medication. An alternative for hypothesis testing only are rank tests with outcomes from patients ‘requiring rescue medication’ ranked worst, and non‐rescued patients ranked according to final visit values. However, an appropriate ranking of not observed values may be controversial. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

13.
Marginal imputation, that consists of imputing items separately, generally leads to biased estimators of bivariate parameters such as finite population coefficients of correlation. To overcome this problem, two main approaches have been considered in the literature: the first consists of using customary imputation methods such as random hot‐deck imputation and adjusting for the bias at the estimation stage. This approach was studied in Skinner & Rao 2002 . In this paper, we extend the results of Skinner & Rao 2002 to the case of arbitrary sampling designs and three variants of random hot‐deck imputation. The second approach consists of using an imputation method, which preserves the relationship between variables. Shao & Wang 2002 proposed a joint random regression imputation procedure that succeeds in preserving the relationships between two study variables. One drawback of the Shao–Wang procedure is that it suffers from an additional variability (called the imputation variance) due to the random selection of residuals, resulting in potentially inefficient estimators. Following Chauvet, Deville, & Haziza 2011 , we propose a fully efficient version of the Shao–Wang procedure that preserves the relationship between two study variables, while virtually eliminating the imputation variance. Results of a simulation study support our findings. An application using data from the Workplace and Employees Survey is also presented. The Canadian Journal of Statistics 40: 124–149; 2012 © 2011 Statistical Society of Canada  相似文献   

14.
By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data.  相似文献   

15.
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。  相似文献   

16.
Abstract.  A kernel regression imputation method for missing response data is developed. A class of bias-corrected empirical log-likelihood ratios for the response mean is defined. It is shown that any member of our class of ratios is asymptotically chi-squared, and the corresponding empirical likelihood confidence interval for the response mean is constructed. Our ratios share some of the desired features of the existing methods: they are self-scale invariant and no plug-in estimators for the adjustment factor and asymptotic variance are needed; when estimating the non-parametric function in the model, undersmoothing to ensure root- n consistency of the estimator for the parameter is avoided. Since the range of bandwidths contains the optimal bandwidth for estimating the regression function, the existing data-driven algorithm is valid for selecting an optimal bandwidth. We also study the normal approximation-based method. A simulation study is undertaken to compare the empirical likelihood with the normal approximation method in terms of coverage accuracies and average lengths of confidence intervals.  相似文献   

17.
18.
The authors study the estimation of domain totals and means under survey‐weighted regression imputation for missing items. They use two different approaches to inference: (i) design‐based with uniform response within classes; (ii) model‐assisted with ignorable response and an imputation model. They show that the imputed domain estimators are biased under (i) but approximately unbiased under (ii). They obtain a bias‐adjusted estimator that is approximately unbiased under (i) or (ii). They also derive linearization variance estimators. They report the results of a simulation study on the bias ratio and efficiency of alternative estimators, including a complete case estimator that requires the knowledge of response indicators.  相似文献   

19.
This paper compares the performance of weighted generalized estimating equations (WGEEs), multiple imputation based on generalized estimating equations (MI-GEEs) and generalized linear mixed models (GLMMs) for analyzing incomplete longitudinal binary data when the underlying study is subject to dropout. The paper aims to explore the performance of the above methods in terms of handling dropouts that are missing at random (MAR). The methods are compared on simulated data. The longitudinal binary data are generated from a logistic regression model, under different sample sizes. The incomplete data are created for three different dropout rates. The methods are evaluated in terms of bias, precision and mean square error in case where data are subject to MAR dropout. In conclusion, across the simulations performed, the MI-GEE method performed better in both small and large sample sizes. Evidently, this should not be seen as formal and definitive proof, but adds to the body of knowledge about the methods’ relative performance. In addition, the methods are compared using data from a randomized clinical trial.  相似文献   

20.
Under stratified random sampling, we develop a kth-order bootstrap bias-corrected estimator of the number of classes θ which exist in a study region. This research extends Smith and van Belle’s (1984) first-order bootstrap bias-corrected estimator under simple random sampling. Our estimator has applicability for many settings including: estimating the number of animals when there are stratified capture periods, estimating the number of species based on stratified random sampling of subunits (say, quadrats) from the region, and estimating the number of errors/defects in a product based on observations from two or more types of inspectors. When the differences between the strata are large, utilizing stratified random sampling and our estimator often results in superior performance versus the use of simple random sampling and its bootstrap or jackknife [Burnham and Overton (1978)] estimator. The superior performance is often associated with more observed classes, and we provide insights into optimal designation of the strata and optimal allocation of sample sectors to strata.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号