首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
This study investigated the bias of factor loadings obtained from incomplete questionnaire data with imputed scores. Three models were used to generate discrete ordered rating scale data typical of questionnaires, also known as Likert data. These methods were the multidimensional polytomous latent trait model, a normal ogive item response theory model, and the discretized normal model. Incomplete data due to nonresponse were simulated using either missing completely at random or not missing at random mechanisms. Subsequently, for each incomplete data matrix, four imputation methods were applied for imputing item scores. Based on a completely crossed six-factor design, it was concluded that in general, bias was small for all data simulation methods and all imputation methods, and under all nonresponse mechanisms. Imputation method, two-way-plus-error, had the smallest bias in the factor loadings. Bias based on the discretized normal model was greater than that based on the other two models.  相似文献   

2.
Summary.  We propose to use calibrated imputation to compensate for missing values. This technique consists of finding final imputed values that are as close as possible to preliminary imputed values and are calibrated to satisfy constraints. Preliminary imputed values, potentially justified by an imputation model, are obtained through deterministic single imputation. Using appropriate constraints, the resulting imputed estimator is asymptotically unbiased for estimation of linear population parameters such as domain totals. A quasi-model-assisted approach is considered in the sense that inferences do not depend on the validity of an imputation model and are made with respect to the sampling design and a non-response model. An imputation model may still be used to generate imputed values and thus to improve the efficiency of the imputed estimator. This approach has the characteristic of handling naturally the situation where more than one imputation method is used owing to missing values in the variables that are used to obtain imputed values. We use the Taylor linearization technique to obtain a variance estimator under a general non-response model. For the logistic non-response model, we show that ignoring the effect of estimating the non-response model parameters leads to overestimating the variance of the imputed estimator. In practice, the overestimation is expected to be moderate or even negligible, as shown in a simulation study.  相似文献   

3.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

4.
Missing observations due to non‐response are commonly encountered in data collected from sample surveys. The focus of this article is on item non‐response which is often handled by filling in (or imputing) missing values using the observed responses (donors). Random imputation (single or fractional) is used within homogeneous imputation classes that are formed on the basis of categorical auxiliary variables observed on all the sampled units. A uniform response rate within classes is assumed, but that rate is allowed to vary across classes. We construct confidence intervals (CIs) for a population parameter that is defined as the solution to a smooth estimating equation with data collected using stratified simple random sampling. The imputation classes are assumed to be formed across strata. Fractional imputation with a fixed number of random draws is used to obtain an imputed estimating function. An empirical likelihood inference method under the fractional imputation is proposed and its asymptotic properties are derived. Two asymptotically correct bootstrap methods are developed for constructing the desired CIs. In a simulation study, the proposed bootstrap methods are shown to outperform traditional bootstrap methods and some non‐bootstrap competitors under various simulation settings. The Canadian Journal of Statistics 47: 281–301; 2019 © 2019 Statistical Society of Canada  相似文献   

5.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

6.
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

7.
Item non‐response in surveys occurs when some, but not all, variables are missing. Unadjusted estimators tend to exhibit some bias, called the non‐response bias, if the respondents differ from the non‐respondents with respect to the study variables. In this paper, we focus on item non‐response, which is usually treated by some form of single imputation. We examine the properties of doubly robust imputation procedures, which are those that lead to an estimator that remains consistent if either the outcome variable or the non‐response mechanism is adequately modelled. We establish the double robustness property of the imputed estimator of the finite population distribution function under random hot‐deck imputation within classes. We also discuss the links between our approach and that of Chambers and Dunstan. The results of a simulation study support our findings.  相似文献   

8.
Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.  相似文献   

9.
Caren Hasler  Yves Tillé 《Statistics》2016,50(6):1310-1331
Random imputation is an interesting class of imputation methods to handle item nonresponse because it tends to preserve the distribution of the imputed variable. However, such methods amplify the total variance of the estimators because values are imputed at random. This increase in variance is called imputation variance. In this paper, we propose a new random hot-deck imputation method that is based on the k-nearest neighbour methodology. It replaces the missing value of a unit with the observed value of a similar unit. Calibration and balanced sampling are applied to minimize the imputation variance. Moreover, our proposed method provides triple protection against nonresponse bias. This means that if at least one out of three specified models holds, then the resulting total estimator is unbiased. Finally, our approach allows the user to perform consistency edits and to impute simultaneously.  相似文献   

10.
It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated.  相似文献   

11.
Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

12.
A meta-analysis of a continuous outcome measure may involve missing standard errors. This is not a problem depending on assumptions made about the population standard deviation. Multiple imputation can be used to impute missing values while allowing for uncertainty in the imputation. Markov chain Monte Carlo simulation is a multiple imputation technique for generating posterior predictive distributions for missing data. We present an example of imputing missing variances using WinBUGS. The example highlights the importance of checking model assumptions, whether for missing or observed data.  相似文献   

13.
An imputation procedure is a procedure by which each missing value in a data set is replaced (imputed) by an observed value using a predetermined resampling procedure. The distribution of a statistic computed from a data set consisting of observed and imputed values, called a completed data set, is affecwd by the imputation procedure used. In a Monte Carlo experiment, three imputation procedures are compared with respect to the empirical behavior of the goodness-of- fit chi-square statistic computed from a completed data set. The results show that each imputation procedure affects the distribution of the goodness-of-fit chi-square statistic in 3. different manner. However, when the empirical behavior of the goodness-of-fit chi-square statistic is compared u, its appropriate asymptotic distribution, there are no substantial differences between these imputation procedures.  相似文献   

14.
In this paper, we introduce a fresh methodology for imputing missing values by making use of sensible constraints on both a study variable and auxiliary variables that are correlated with the variable of interest. The resultant estimator based on these imputed values is shown to lead to the regression type method of imputation in survey sampling. Furthermore, when the data are hybrid of both that missing at random and missing complexly at random, the resultant estimator is shown to be a consistent estimator that has asymptotic mean squared error equal to that of the linear regression method of imputation. A generalization to any type of method of imputation is possible and has been included at the end.  相似文献   

15.
Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra‐household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness.  相似文献   

16.
Imputation is often used in surveys to treat item nonresponse. It is well known that treating the imputed values as observed values may lead to substantial underestimation of the variance of the point estimators. To overcome the problem, a number of variance estimation methods have been proposed in the literature, including resampling methods such as the jackknife and the bootstrap. In this paper, we consider the problem of doubly robust inference in the presence of imputed survey data. In the doubly robust literature, point estimation has been the main focus. In this paper, using the reverse framework for variance estimation, we derive doubly robust linearization variance estimators in the case of deterministic and random regression imputation within imputation classes. Also, we study the properties of several jackknife variance estimators under both negligible and nonnegligible sampling fractions. A limited simulation study investigates the performance of various variance estimators in terms of relative bias and relative stability. Finally, the asymptotic normality of imputed estimators is established for stratified multistage designs under both deterministic and random regression imputation. The Canadian Journal of Statistics 40: 259–281; 2012 © 2012 Statistical Society of Canada  相似文献   

17.
Medical research frequently focuses on the relationship between quality of life (QoL) and survival time of subjects. QoL may be one of the most important factors that could be used to predict survival, making it worth identifying factors that jointly affect survival and QoL. We propose a semiparametric joint model that consists of item response and survival components, where these two components are linked through latent variables. Several popular ordinal models are considered and compared in the item response component, while the Cox proportional hazards model is used in the survival component. We estimate the baseline hazard function and model parameters simultaneously, through a profile likelihood approach. We illustrate the method using an example from a clinical study.  相似文献   

18.
We propose Bayesian parameter estimation in a multidimensional item response theory model using the Gibbs sampling algorithm. We apply this approach to dichotomous responses to a questionnaire on sleep quality. The analysis helps determine the underlying dimensions.  相似文献   

19.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

20.
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号