首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey.  相似文献   

2.
ABSTRACT

This paper analyses the behaviour of the goodness-of-fit tests for regression models. To this end, it uses statistics based on an estimation of the integrated regression function with missing observations either in the response variable or in some of the covariates. It proposes several versions of one empirical process, constructed from a previous estimation, that uses only the complete observations or replaces the missing observations with imputed values. In the case of missing covariates, a link model is used to fill the missing observations with other complete covariates. In all the situations, Bootstrap methodology is used to calibrate the distribution of the test statistics. A broad simulation study compares the different procedures based on empirical regression methodology, with smoothed tests previously studied in the literature. The comparison reflects the effect of the correlation between the covariates in the tests based on the imputed sample for missing covariates. In addition, the paper proposes a computational binning strategy to evaluate the tests based on an empirical process for large data sets. Finally, two applications to real data illustrate the performance of the tests.  相似文献   

3.
This article addresses issues in creating public-use data files in the presence of missing ordinal responses and subsequent statistical analyses of the dataset by users. The authors propose a fully efficient fractional imputation (FI) procedure for ordinal responses with missing observations. The proposed imputation strategy retrieves the missing values through the full conditional distribution of the response given the covariates and results in a single imputed data file that can be analyzed by different data users with different scientific objectives. Two most critical aspects of statistical analyses based on the imputed data set,  validity  and  efficiency, are examined through regression analysis involving the ordinal response and a selected set of covariates. It is shown through both theoretical development and simulation studies that, when the ordinal responses are missing at random, the proposed FI procedure leads to valid and highly efficient inferences as compared to existing methods. Variance estimation using the fractionally imputed data set is also discussed. The Canadian Journal of Statistics 48: 138–151; 2020 © 2019 Statistical Society of Canada  相似文献   

4.
Summary.  We propose to use calibrated imputation to compensate for missing values. This technique consists of finding final imputed values that are as close as possible to preliminary imputed values and are calibrated to satisfy constraints. Preliminary imputed values, potentially justified by an imputation model, are obtained through deterministic single imputation. Using appropriate constraints, the resulting imputed estimator is asymptotically unbiased for estimation of linear population parameters such as domain totals. A quasi-model-assisted approach is considered in the sense that inferences do not depend on the validity of an imputation model and are made with respect to the sampling design and a non-response model. An imputation model may still be used to generate imputed values and thus to improve the efficiency of the imputed estimator. This approach has the characteristic of handling naturally the situation where more than one imputation method is used owing to missing values in the variables that are used to obtain imputed values. We use the Taylor linearization technique to obtain a variance estimator under a general non-response model. For the logistic non-response model, we show that ignoring the effect of estimating the non-response model parameters leads to overestimating the variance of the imputed estimator. In practice, the overestimation is expected to be moderate or even negligible, as shown in a simulation study.  相似文献   

5.
In this article, we consider the estimation of a population mean when some observations on the study characteristic are missing in the bivariate sample data. In all, five estimators are presented and their efficiency properties are discussed. One estimator arises from the the amputation of incomplete observations while the remaining four estimators are formulated using imputed values obtained by the ratio method of estimation. This work was carried out before Professor V.K. Srivastava passed away in 2001.  相似文献   

6.
We used a proper multiple imputation (MI) through Gibbs sampling approach to impute missing values of a gamma distributed outcome variable which were missing at random, using generalized linear model (GLM) with identity link function. The missing values of the outcome variable were multiply imputed using GLM and then the complete data sets obtained after MI were analysed through GLM again for the estimation purpose. We examined the performance of the proposed technique through a simulation study with the data sets having four moderate and large proportions of missing values, 10%, 20%, 30% and 50%. We also applied this technique on a real life data and compared the results with those obtained by applying GLM only on observed cases. The results showed that the proposed technique gave better results for moderate proportions of missing values.  相似文献   

7.
In this paper, we introduce a fresh methodology for imputing missing values by making use of sensible constraints on both a study variable and auxiliary variables that are correlated with the variable of interest. The resultant estimator based on these imputed values is shown to lead to the regression type method of imputation in survey sampling. Furthermore, when the data are hybrid of both that missing at random and missing complexly at random, the resultant estimator is shown to be a consistent estimator that has asymptotic mean squared error equal to that of the linear regression method of imputation. A generalization to any type of method of imputation is possible and has been included at the end.  相似文献   

8.
It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated.  相似文献   

9.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

10.
There has been increasing use of quality-of-life (QoL) instruments in drug development. Missing item values often occur in QoL data. A common approach to solve this problem is to impute the missing values before scoring. Several imputation procedures, such as imputing with the most correlated item and imputing with a row/column model or an item response model, have been proposed. We examine these procedures using data from two clinical trials, in which the original asthma quality-of-life questionnaire (AQLQ) and the miniAQLQ were used. We propose two modifications to existing procedures: truncating the imputed values to eliminate outliers and using the proportional odds model as the item response model for imputation. We also propose a novel imputation method based on a semi-parametric beta regression so that the imputed value is always in the correct range and illustrate how this approach can easily be implemented in commonly used statistical software. To compare these approaches, we deleted 5% of item values in the data according to three different missingness mechanisms, imputed them using these approaches and compared the imputed values with the true values. Our comparison showed that the row/column-model-based imputation with truncation generally performed better, whereas our new approach had better performance under a number scenarios.  相似文献   

11.
Summary.  We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries.  相似文献   

12.
Imputation is often used in surveys to treat item nonresponse. It is well known that treating the imputed values as observed values may lead to substantial underestimation of the variance of the point estimators. To overcome the problem, a number of variance estimation methods have been proposed in the literature, including resampling methods such as the jackknife and the bootstrap. In this paper, we consider the problem of doubly robust inference in the presence of imputed survey data. In the doubly robust literature, point estimation has been the main focus. In this paper, using the reverse framework for variance estimation, we derive doubly robust linearization variance estimators in the case of deterministic and random regression imputation within imputation classes. Also, we study the properties of several jackknife variance estimators under both negligible and nonnegligible sampling fractions. A limited simulation study investigates the performance of various variance estimators in terms of relative bias and relative stability. Finally, the asymptotic normality of imputed estimators is established for stratified multistage designs under both deterministic and random regression imputation. The Canadian Journal of Statistics 40: 259–281; 2012 © 2012 Statistical Society of Canada  相似文献   

13.
The estimation of the mixtures of regression models is usually based on the normal assumption of components and maximum likelihood estimation of the normal components is sensitive to noise, outliers, or high-leverage points. Missing values are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this article, we propose the mixtures of regression models for contaminated incomplete heterogeneous data. The proposed models provide robust estimates of regression coefficients varying across latent subgroups even under the presence of missing values. The methodology is illustrated through simulation studies and a real data analysis.  相似文献   

14.
Questions about monetary variables (such as income, wealth or savings) are key components of questionnaires on household finances. However, missing information on such sensitive topics is a well-known phenomenon which can seriously bias any inference based only on complete-case analysis. Many imputation techniques have been developed and implemented in several surveys. Using the German SAVE data, a new estimation technique is necessary to overcome the upward bias of monetary variables caused by the initially implemented imputation procedure. The upward bias is the result of adding random draws to the implausible negative values predicted by OLS regressions until all values are positive. To overcome this problem the logarithm of the dependent variable is taken and the predicted values are retransformed to the original scale by Duan’s smearing estimate. This paper evaluates the two different techniques for the imputation of monetary variables implementing a simulation study, where a random pattern of missingness is imposed on the observed values of the variables of interest. A Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure. All waves are consistently imputed using the new method.  相似文献   

15.
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.  相似文献   

16.
The problem of missing observations in regression models is often solved by using imputed values to complete the sample. As an alternative for static models, it has been suggested to limit the analysis to the periods or units for which all relevant variables are observed. The choice of an imputation procedure affects the asymptotic efficiency of the method used to subsequently estimate the parameters of the model. In this note, we show that the relative asymptotic efficiency of three estimators designed to handle incomplete samples depends on parameters that have a straightforward statistical interpretation. In terms of a gain of asymptotic efficiency, the use of these estimators is equivalent to the observation of a percentage of the values which are actually missing. This percentage depends on three R2-measures only, which can be straightforwardly computed in applied work. Therefore it should be easy in practice to check whether it is worthwhile to use a more elaborate estimator.  相似文献   

17.
Parameter estimation with missing data is a frequently encountered problem in statistics. Imputation is often used to facilitate the parameter estimation by simply applying the complete-sample estimators to the imputed dataset.In this article, we consider the problem of parameter estimation with nonignorable missing data using the approach of parametric fractional imputation proposed by Kim (2011). Using the fractional weights, the E-step of the EM algorithm can be approximated by the weighted mean of the imputed data likelihood where the fractional weights are computed from the current value of the parameter estimates. Calibration fractional imputation is also considered as a way for improving the Monte Carlo approximation in the fractional imputation. Variance estimation is also discussed. Results from two simulation studies are presented to compare the proposed method with the existing methods. A real data example from the Korea Labor and Income Panel Survey (KLIPS) is also presented.  相似文献   

18.
Nonresponse is a very common phenomenon in survey sampling. Nonignorable nonresponse – that is, a response mechanism that depends on the values of the variable having nonresponse – is the most difficult type of nonresponse to handle. This article develops a robust estimation approach to estimating equations (EEs) by incorporating the modelling of nonignorably missing data, the generalized method of moments (GMM) method and the imputation of EEs via the observed data rather than the imputed missing values when some responses are subject to nonignorably missingness. Based on a particular semiparametric logistic model for nonignorable missing response, this paper proposes the modified EEs to calculate the conditional expectation under nonignorably missing data. We can apply the GMM to infer the parameters. The advantage of our method is that it replaces the non-parametric kernel-smoothing with a parametric sampling importance resampling (SIR) procedure to avoid nonparametric kernel-smoothing problems with high dimensional covariates. The proposed method is shown to be more robust than some current approaches by the simulations.  相似文献   

19.
This paper addresses the problem of the probability density estimation in the presence of covariates when data are missing at random (MAR). The inverse probability weighted method is used to define a nonparametric and a semiparametric weighted probability density estimators. A regression calibration technique is also used to define an imputed estimator. It is shown that all the estimators are asymptotically normal with the same asymptotic variance as that of the inverse probability weighted estimator with known selection probability function and weights. Also, we establish the mean squared error (MSE) bounds and obtain the MSE convergence rates. A simulation is carried out to assess the proposed estimators in terms of the bias and standard error.  相似文献   

20.
Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号