共查询到17条相似文献,搜索用时 62 毫秒
1.
插补法是对缺失数据的调整方法,多重插补弥补了单一插补的缺陷,采用一系列可能的数据集来填充每一个缺失数据值,反映了缺失数据的不确定性。本文介绍了多重插补程序的三种数据插补方法:回归预测法、倾向得分法和蒙特卡罗的马氏链方法,并且对多重插补的插补效果进行推断,指出多重插补存在的问题。 相似文献
2.
当对插补所得的“完整数据集”使用标准的完全数据统计方法的时候,往往会低估插补估计量的方差.Bootstrap方法(自助法)是非参数统计中的一种重要的统计方法,是基于原始观测数据进行重复抽样,能充分的利用已知数据,不需要对未知总体进行任何的分布假设或增加新的样本信息,进而再利用现有的统计模型对总体的分布特性进行统计推断.本文首先运用多重插补的方法对缺失数据进行了插补,之后利用Bootstrap方法对插补之后的数据进行了插补统计量的方差估计,结果表明运用Bootstrap方法进行插补统计量的方差估计更科学更准确. 相似文献
3.
分层随机抽样条件下缺失数据的多重插补方法 总被引:2,自引:1,他引:2
介绍分层随机抽样条件下多重插补法处理缺失数据的基本思想,分析可忽略无回答的分层随机抽样建立多重插补的常用方法,并通过实例加以说明. 相似文献
4.
5.
在经济计量分析中收入变量的缺失值是一个普遍而又较难处理的问题。传统的处理方法往往导致分析结果具有系统偏差。本文提出利用基于链式方程的多重插补方法来处理收入变量的缺失值问题。文章将此方法应用到一个实际数据集,然后通过分析插补后的数据集讨论了此方法的性质,并和其他多重插补方法进行了比较。结果表明:基于链式方程的多重插补能在一定程度上纠正推断结果的系统偏差,并且给出恰当的标准差估计。 相似文献
6.
多重插补处理缺失数据方法的理论基础探析 总被引:4,自引:0,他引:4
本文在比较单一插补法与多重插补法的基础上,对多重插补处理方法的理论基础做了深入探讨,并介绍了多重插补法处理缺失数据的基本思想. 相似文献
7.
文章将抽样调查中由于项目无回答所形成的缺失数据作为研究着眼点,从矩阵运算的角度分析了此类缺失数据带来的危害,在此基础上,对缺失数据插补处理方法的基本问题进行了讨论,分析了各种单一插补方法特点及局限性,并介绍了简单随机抽样、分层随机抽样条件下缺失数据多重插补的抽样推断方法,在此基础上,对常用的单一插补和多重插补方法进行了比较,并对简单随机抽样、分层随机抽样条件下缺失数据单一插补与多重插补方法的效率进行了实证研究与比较。 相似文献
8.
文章在简要介绍EM算法的基础上,对MCMC算法,特别是DA算法实现缺失数据补全做了深入探讨,介绍了DA算法迭代模拟过程,并对DA算法与EM算法进行了比较。 相似文献
9.
基于随机森林模型的分类数据缺失值插补 总被引:6,自引:1,他引:6
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。 相似文献
10.
缺失数据问题在抽样调查、社会科学、流行病等领域普遍存在,这一现象在高维情形下更为凸显;而与高维数据相伴的信息海量化、复杂化、异质化、缺失化等问题,给高维缺失数据理论建立及应用研究带来极大的挑战。如何建立一种稳健高效的高维缺失数据插补方法,已成为当今学者研究的焦点。为解决上述难题,创新性地将增强的逆概率加权(IPW)与加法模型融合,应用协变量平衡倾向评分法(CBPS)估计缺失概率,提出一种适用于高维缺失数据的可加协变量平衡倾向评分插补方法(CBPS-AM),期望对高维缺失问题提供更为有效的解决方案。CBPS-AM方法不仅具有多重稳健性,避免了模型误设带来的严重风险,还能够有效规避高维缺失数据具有厚尾分布而使得传统插补方法失效的问题,起到双重降维的作用,实现建模的灵活性与广泛适用性。其次借鉴广义矩估计方法和Backfitting算法给出了CBPS估计算法,该算法简洁有效,能够提高数据使用效率与插补精度,同时研究了估计量的理论性质,对比了所提方法与传统方法在数值模拟中的表现。最后将CBPS-AM方法分别应用于存在缺失的HIV临床试验数据和中国新冠病毒感染疫情数据中,建立科学的综合评价以及针对... 相似文献
11.
Hang J. Kim Jerome P. Reiter Quanli Wang Lawrence H. Cox Alan F. Karr 《商业与经济统计学杂志》2014,32(3):375-386
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online. 相似文献
12.
We present three multiple imputation estimates for the Cox model with missing covariates. Two of the suggested estimates are
asymptotically equivalent to estimates in the literature when the number of multiple imputations approaches infinity. The
third estimate can be implemented using standard software that could handle time-varying covariates.
This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献
13.
Although the normality assumption has been regarded as a mathematical convenience for inferential purposes due to its nice distributional properties, there has been a growing interest regarding generalized classes of distributions that span a much broader spectrum in terms of symmetry and peakedness behavior. In this respect, Fleishman's power polynomial method seems to have been gaining popularity in statistical theory and practice because of its flexibility and ease of execution. In this article, we conduct multiple imputation for univariate continuous data under Fleishman polynomials to explore the extent to which this procedure works properly. We also make comparisons with normal imputation models via widely accepted accuracy and precision measures using simulated data that exhibit different distributional features as characterized by competing specifications of the third and fourth moments. Finally, we discuss generalizations to the multivariate case. Multiple imputation under power polynomials that cover most of the feasible area in the skewness-elongation plane appears to have substantial potential of capturing real missing-data trends. 相似文献
14.
《统计学通讯:理论与方法》2013,42(10):2443-2467
Abstract We consider multiple linear regression models under nonnormality. We derive modified maximum likelihood estimators (MMLEs) of the parameters and show that they are efficient and robust. We show that the least squares esimators are considerably less efficient. We compare the efficiencies of the MMLEs and the M estimators for symmetric distributions and show that, for plausible alternatives to an assumed distribution, the former are more efficient. We provide real-life examples. 相似文献
15.
A popular nonparametric treatment of missing value imputation uses methods based on k-nearest neighbors, where the number k of nearest neighbors is fixed without any consideration of the local features of missing values. This article proposes an alternative imputation method based on adaptive nearest neighbors, which takes into account the local features of the data. The proposed method adapts the number of neighbors in imputing the missing values according to the location of the missing values. Efficiency evaluation is then gauged through simulation studies using both simulated and real data. It is shown that the proposed method has distinct advantages over the imputation method based on k-nearest neighbors. 相似文献
16.
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。 相似文献
17.
Weextend Wei and Tanner's (1991) multiple imputation approach insemi-parametric linear regression for univariate censored datato clustered censored data. The main idea is to iterate the followingtwo steps: 1) using the data augmentation to impute for censoredfailure times; 2) fitting a linear model with imputed completedata, which takes into consideration of clustering among failuretimes. In particular, we propose using the generalized estimatingequations (GEE) or a linear mixed-effects model to implementthe second step. Through simulation studies our proposal comparesfavorably to the independence approach (Lee et al., 1993), whichignores the within-cluster correlation in estimating the regressioncoefficient. Our proposal is easy to implement by using existingsoftwares. 相似文献