首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated.  相似文献   

2.
The method of likelihood imputation is devised under the framework of latent structure models where the observation is a statistic of the complete data which can only be specified on a latent basis. The imputed data set is chosen to differ least from the observed one in their information contents—a concept with general implications for the analysis of incomplete-data. In contrast to the standard conditional-mean single imputation, our procedure depends on an entire likelihood region instead of any single point in it, and yields consistent parameter estimators nevertheless. We explain its implementations and illustrate with data from panel surveys and linear regression with censorship. We also discuss its potentials in sensitivity analysis  相似文献   

3.
Summary.  We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries.  相似文献   

4.
A set of three goodness-of-fit procedures is proposed to investigate the adequacy of fit of Fisher's distribution on the sphere as a model for a given sample of spherical data. The procedures are all based on standard tests using the empirical distribution function.  相似文献   

5.
There has been increasing use of quality-of-life (QoL) instruments in drug development. Missing item values often occur in QoL data. A common approach to solve this problem is to impute the missing values before scoring. Several imputation procedures, such as imputing with the most correlated item and imputing with a row/column model or an item response model, have been proposed. We examine these procedures using data from two clinical trials, in which the original asthma quality-of-life questionnaire (AQLQ) and the miniAQLQ were used. We propose two modifications to existing procedures: truncating the imputed values to eliminate outliers and using the proportional odds model as the item response model for imputation. We also propose a novel imputation method based on a semi-parametric beta regression so that the imputed value is always in the correct range and illustrate how this approach can easily be implemented in commonly used statistical software. To compare these approaches, we deleted 5% of item values in the data according to three different missingness mechanisms, imputed them using these approaches and compared the imputed values with the true values. Our comparison showed that the row/column-model-based imputation with truncation generally performed better, whereas our new approach had better performance under a number scenarios.  相似文献   

6.
In this article, we develop a formal goodness-of-fit testing procedure for one-shot device testing data, in which each observation in the sample is either left censored or right censored. Such data are also called current status data. We provide an algorithm for calculating the nonparametric maximum likelihood estimate (NPMLE) of the unknown lifetime distribution based on such data. Then, we consider four different test statistics that can be used for testing the goodness-of-fit of accelerated failure time (AFT) model by the use of samples of residuals: a chi-square-type statistic based on the difference between the empirical and expected numbers of failures at each inspection time; two other statistics based on the difference between the NPMLE of the lifetime distribution obtained from one-shot device testing data and the distribution specified under the null hypothesis; as a final statistic, we use White's idea of comparing two estimators of the Fisher Information (FI) to propose a test statistic. We then compare these tests in terms of power, and draw some conclusions. Finally, we present an example to illustrate the proposed tests.  相似文献   

7.
This article addresses issues in creating public-use data files in the presence of missing ordinal responses and subsequent statistical analyses of the dataset by users. The authors propose a fully efficient fractional imputation (FI) procedure for ordinal responses with missing observations. The proposed imputation strategy retrieves the missing values through the full conditional distribution of the response given the covariates and results in a single imputed data file that can be analyzed by different data users with different scientific objectives. Two most critical aspects of statistical analyses based on the imputed data set,  validity  and  efficiency, are examined through regression analysis involving the ordinal response and a selected set of covariates. It is shown through both theoretical development and simulation studies that, when the ordinal responses are missing at random, the proposed FI procedure leads to valid and highly efficient inferences as compared to existing methods. Variance estimation using the fractionally imputed data set is also discussed. The Canadian Journal of Statistics 48: 138–151; 2020 © 2019 Statistical Society of Canada  相似文献   

8.
Caren Hasler  Yves Tillé 《Statistics》2016,50(6):1310-1331
Random imputation is an interesting class of imputation methods to handle item nonresponse because it tends to preserve the distribution of the imputed variable. However, such methods amplify the total variance of the estimators because values are imputed at random. This increase in variance is called imputation variance. In this paper, we propose a new random hot-deck imputation method that is based on the k-nearest neighbour methodology. It replaces the missing value of a unit with the observed value of a similar unit. Calibration and balanced sampling are applied to minimize the imputation variance. Moreover, our proposed method provides triple protection against nonresponse bias. This means that if at least one out of three specified models holds, then the resulting total estimator is unbiased. Finally, our approach allows the user to perform consistency edits and to impute simultaneously.  相似文献   

9.
"This paper gives a brief introduction to multiple imputation for handling non-response in surveys. We then describe a recently completed project in which multiple imputation was used to recalibrate industry and occupation codes in 1970 U.S. census public use samples to the 1980 standard. Using analyses of data from the project, we examine the utility of analysing a large data set having imputed values compared with analysing a small data set having true values, and we provide examples of the amount by which variability is underestimated by using just one imputation rather than multiple imputations."  相似文献   

10.
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

11.
When faced with the problem of goodness-of-fit to the Lognormal distribution, testing methods typically reduce to comparing the empirical distribution function of the corresponding logarithmic data to that of the normal distribution. In this article, we consider a family of test statistics which make use of the moment structure of the Lognormal law. In particular, a continuum of moment conditions is employed in the construction of a new statistic for this distribution. The proposed test is shown to be consistent against fixed alternatives, and a simulation study shows that it is more powerful than several classical procedures, including those utilizing the empirical distribution function. We conclude by applying the proposed method to some, not so typical, data sets.  相似文献   

12.
ABSTRACT

This paper analyses the behaviour of the goodness-of-fit tests for regression models. To this end, it uses statistics based on an estimation of the integrated regression function with missing observations either in the response variable or in some of the covariates. It proposes several versions of one empirical process, constructed from a previous estimation, that uses only the complete observations or replaces the missing observations with imputed values. In the case of missing covariates, a link model is used to fill the missing observations with other complete covariates. In all the situations, Bootstrap methodology is used to calibrate the distribution of the test statistics. A broad simulation study compares the different procedures based on empirical regression methodology, with smoothed tests previously studied in the literature. The comparison reflects the effect of the correlation between the covariates in the tests based on the imputed sample for missing covariates. In addition, the paper proposes a computational binning strategy to evaluate the tests based on an empirical process for large data sets. Finally, two applications to real data illustrate the performance of the tests.  相似文献   

13.
ABSTRACT

A simple and efficient goodness-of-fit test for exponentiality is developed by exploiting the characterization of the exponential distribution using the probability integral transformation. We adopted the empirical likelihood methodology in constructing the test statistic. The proposed test statistic has a chi-square limiting distribution. For small to moderate sample sizes Monte-Carlo simulations revealed that our proposed tests are much more superior under increasing failure rate (IFR) and bathtub decreasing-increasing failure rate (BFR) alternatives. Real data examples were used to demonstrate the robustness and applicability of our proposed tests in practice.  相似文献   

14.
Resampling methods are a common measure to estimate the variance of a statistic of interest when data consist of nonresponse and imputation is used as compensation. Applying resampling methods usually means that subsamples are drawn from the original sample and that variance estimates are computed based on point estimators of several subsamples. However, newer resampling methods such as the rescaling bootstrap of Chipperfield and Preston [Efficient bootstrap for business surveys. Surv Methodol. 2007;33:167–172] include all elements of the original sample in the computation of its point estimator. Thus, procedures to consider imputation in resampling methods cannot be applied in the ordinary way. For such methods, modifications are necessary. This paper presents an approach applying newer resampling methods for imputed data. The Monte Carlo simulation study conducted in the paper shows that the proposed approach leads to reliable variance estimates in contrast to other modifications.  相似文献   

15.
Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey.  相似文献   

16.
A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions.  相似文献   

17.
Test procedures are constructed for testing the goodness-of-fit of the error distribution in the regression context. The test statistic is based on an L 2-type distance between the characteristic function of the (assumed) error distribution and the empirical characteristic function of the residuals. The asymptotic null distribution as well as the behavior of the test statistic under contiguous alternatives is investigated, while the issue of the choice of suitable estimators has been particularly emphasized. Theoretical results are accompanied by a simulation study.  相似文献   

18.
In this article, an iterative single-point imputation (SPI) algorithm, called quantile-filling algorithm for the analysis of interval-censored data, is studied. This approach combines the simplicity of the SPI and the iterative thoughts of multiple imputation. The virtual complete data are imputed by conditional quantiles on the intervals. The algorithm convergence is based on the convergence of the moment estimation from the virtual complete data. Simulation studies have been carried out and the results are shown for interval-censored data generated from the Weibull distribution. For the Weibull distribution, complete procedures of the algorithm are shown in closed forms. Furthermore, the algorithm is applicable to the parameter inference with other distributions. From simulation studies, it has been found that the algorithm is feasible and stable. The estimation accuracy is also satisfactory.  相似文献   

19.
This paper proposes the singly truncated normal distribution as a model for estimating radiance measurements from satellite-borne infrared sensors. These measurements are made in order to estimate sea surface temperatures which can be related to radiances. Maximum likelihood estimation is used to provide estimates for the unknown parameters. In particular, a procedure is described for estimating clear radiances in the presence of clouds and the Kolmogorov-Smirnov statistic is used to test goodness-of-fit of the measurements to the singly truncated normal distribution. Tables of quantile values of the Kolmogorov-Smirnov statistic for several values of the truncation point are generated from Monie Carlo experiment Mnally a numerical emample using satetic data is presented to illustrate the application of the procedures.  相似文献   

20.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号