首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 46 毫秒
1.
插补法是对缺失数据的调整方法,多重插补弥补了单一插补的缺陷,采用一系列可能的数据集来填充每一个缺失数据值,反映了缺失数据的不确定性。本文介绍了多重插补程序的三种数据插补方法:回归预测法、倾向得分法和蒙特卡罗的马氏链方法,并且对多重插补的插补效果进行推断,指出多重插补存在的问题。  相似文献   

2.
文章将抽样调查中由于项目无回答所形成的缺失数据作为研究着眼点,从矩阵运算的角度分析了此类缺失数据带来的危害,在此基础上,对缺失数据插补处理方法的基本问题进行了讨论,分析了各种单一插补方法特点及局限性,并介绍了简单随机抽样、分层随机抽样条件下缺失数据多重插补的抽样推断方法,在此基础上,对常用的单一插补和多重插补方法进行了比较,并对简单随机抽样、分层随机抽样条件下缺失数据单一插补与多重插补方法的效率进行了实证研究与比较。  相似文献   

3.
文章通过多重插补方法对不同缺失率和缺失模式的多变量缺失样本进行插补,研究了多重插补误差与缺失率和缺失模式的依赖关系。结果表明,当缺失率为0~15%时,多重插补误差与缺失率呈线性关系;当缺失率大于15%时,两者呈偏离线性关系。多重插补误差与缺失模式的方差均值比呈正相关性,当方差均值比越大时,误差也越大。  相似文献   

4.
多重插补处理缺失数据方法的理论基础探析   总被引:4,自引:0,他引:4  
本文在比较单一插补法与多重插补法的基础上,对多重插补处理方法的理论基础做了深入探讨,并介绍了多重插补法处理缺失数据的基本思想.  相似文献   

5.
分层随机抽样条件下缺失数据的多重插补方法   总被引:2,自引:1,他引:2  
介绍分层随机抽样条件下多重插补法处理缺失数据的基本思想,分析可忽略无回答的分层随机抽样建立多重插补的常用方法,并通过实例加以说明.  相似文献   

6.
文章在简要介绍EM算法的基础上,对MCMC算法,特别是DA算法实现缺失数据补全做了深入探讨,介绍了DA算法迭代模拟过程,并对DA算法与EM算法进行了比较。  相似文献   

7.
缺失值是调查中普遍存在的问题,对缺失值进行插补是处理缺失值的较好方法.如果变量之间存在相关关系,可以通过正态线形模型利用不存在缺失值的变量对有存在缺失值的变量进行插补.较之单一插补,多重插补更能有效地估计总体方差,因此更多地被使用.文章借助Bootstrap法,让模型的参数和残差来自完全观测的Bootstrap样本的最小平法估计,可进一步准确估计总体方差.通过大量模拟试验,发现Bootstrap多重插补较之单一插补和一般多重插补能构建更宽的置信区间从而有更准确的总体参数覆盖率,这点在数据缺失比重很大时优势更明显.  相似文献   

8.
缺失数据问题在抽样调查、社会科学、流行病等领域普遍存在,这一现象在高维情形下更为凸显;而与高维数据相伴的信息海量化、复杂化、异质化、缺失化等问题,给高维缺失数据理论建立及应用研究带来极大的挑战。如何建立一种稳健高效的高维缺失数据插补方法,已成为当今学者研究的焦点。为解决上述难题,创新性地将增强的逆概率加权(IPW)与加法模型融合,应用协变量平衡倾向评分法(CBPS)估计缺失概率,提出一种适用于高维缺失数据的可加协变量平衡倾向评分插补方法(CBPS-AM),期望对高维缺失问题提供更为有效的解决方案。CBPS-AM方法不仅具有多重稳健性,避免了模型误设带来的严重风险,还能够有效规避高维缺失数据具有厚尾分布而使得传统插补方法失效的问题,起到双重降维的作用,实现建模的灵活性与广泛适用性。其次借鉴广义矩估计方法和Backfitting算法给出了CBPS估计算法,该算法简洁有效,能够提高数据使用效率与插补精度,同时研究了估计量的理论性质,对比了所提方法与传统方法在数值模拟中的表现。最后将CBPS-AM方法分别应用于存在缺失的HIV临床试验数据和中国新冠病毒感染疫情数据中,建立科学的综合评价以及针对...  相似文献   

9.
提出基于最近邻插补和关联规则的缺失数据插补方法,将不含缺失数据的变量作为辅助变量,通过定义距离函数寻找与含缺失数据的样本单元距离较近的样本,然后利用挖掘得到的关联规则支持度和提升度乘积的倒数作为权重,对样本单元之间的距离进行加权处理,得到加权距离,再用加权距离最小的样本单元对应的属性值对缺失值进行插补。这种方法可以解决由不同最近距离样本单元得到不同插补值的问题,最后给出了该方法的实施步骤和应用范例。  相似文献   

10.
缺失数据处理方法的比较   总被引:1,自引:1,他引:1  
文章简要介绍了常用的缺失数据处理方法,讨论了缺失数据的处理方法评价标准,并对各种缺失数据的处理方法的特点及适用情况进行了比较.  相似文献   

11.
When analyzing data with missing data, a commonly used method is the inverse probability weighting (IPW) method, which reweights estimating equations with propensity scores. The popularity of the IPW method is due to its simplicity. However, it is often being criticized for being inefficient because most of the information from the incomplete observations is not used. Alternatively, the regression method is known to be efficient but is nonrobust to the misspecification of the regression function. In this article, we propose a novel way of optimally combining the propensity score function and the regression model. The resulting estimating equation enjoys the properties of robustness against misspecification of either the propensity score or the regression function, as well as being locally semiparametric efficient. We demonstrate analytically situations where our method leads to a more efficient estimator than some of its competitors. In a simulation study, we show the new method compares favorably with its competitors in finite samples. Supplementary materials for this article are available online.  相似文献   

12.
Abstract.  Theory on semi-parametric efficient estimation in missing data problems has been systematically developed by Robins and his coauthors. Except in relatively simple problems, semi-parametric efficient scores cannot be expressed in closed forms. Instead, the efficient scores are often expressed as solutions to integral equations. Neumann series was proposed in the form of successive approximation to the efficient scores in those situations. Statistical properties of the estimator based on the Neumann series approximation are difficult to obtain and as a result, have not been clearly studied. In this paper, we reformulate the successive approximation in a simple iterative form and study the statistical properties of the estimator based on the reformulation. We show that a doubly robust locally efficient estimator can be obtained following the algorithm in robustifying the likelihood score. The results can be applied to, among others, parametric regression, marginal regression and Cox regression when data are subject to missing values and the data are missing at random. A simulation study is conducted to evaluate the performance of the approach and a real data example is analysed to demonstrate the use of the approach.  相似文献   

13.
于力超  金勇进 《统计研究》2016,33(1):95-102
抽样调查领域常采用对多个受访者进行跟踪调查得到面板数据,进而对总体特性进行统计推断,在面板数据中常含缺失数据,大多数处理面板缺失数据的软件都是直接删去含缺失值的受访者以得到完全数据集,当数据缺失机制为非随机缺失时会导致总体参数估计结果有偏。本文针对数据缺失机制为非随机缺失情形下,如何对面板数据进行统计分析进行了阐述,主要采用的是基于模型的似然推断法,对目标变量、缺失指示变量和随机效应向量的联合分布建模,在已有选择模型和模式混合模型的基础上,引入随机效应,研究目标变量期望的计算方法,并研究随机效应杂合模型下参数的估计方法,在变量分布相对简单的情形下给出了用极大似然法推断总体参数的估计步骤,最后通过模拟分析比较方法的优劣。  相似文献   

14.
处理缺失数据中辅助信息的利用   总被引:2,自引:0,他引:2       下载免费PDF全文
金勇进 《统计研究》1998,15(1):43-45
统计分析中经常会遇到数据缺失的情况。数据缺失的产生背景不同,主要来自于调查中的无回答。此外,由于调查员的疏忽,在调查过程中遗漏了某些调查项,或在对调查数据的检查与处理过程中,发现一些不合逻辑,明显有误,或有意使假的数据,而将其剔除,这些都会造成数据缺失。 缺失数据造成的危害是明显的,它不仅使接受调查的实际单位数目减少,扩大了抽样调查中的估计量方差,而且还会导致估计量偏差,是影响统计数据质量的重要方面。一般而言,对于缺失数据,往往需要进行重新调查,以便将缺失的数据补齐。但有时由于种种原因和条件的限制,或者无法进行重新的补充调查,或者这种补充调查仍然不能解决问题。这时,我们特别关心两个问题:一是需要了解缺失数据造成的影响有多大,即能否对由于数据缺失带来的估计量偏差进行估计;二是如何对缺失数据进行补救。这两个问题都与辅助信息有关,本文拟就这些问题进行分析。  相似文献   

15.
ABSTRACT

In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method.  相似文献   

16.
In Rubin (1976) the missing at random (MAR) and missing completely at random (MCAR) conditions are discussed. It is concluded that the MAR condition allows one to ignore the missing data mechanism when doing likelihood or Bayesian inference but also that the stronger MCAR condition is in some sense the weakest generally sufficient condition allowing (conditional) frequentist inference while ignoring the missing data mechanism. In this paper it is shown that (a slightly strengthened version of) the MAR condition is sufficient to yield ordinary large sample results for estimators and test statistics and thus may be used for (asymptotic) frequentist inference.  相似文献   

17.
随着研究中对数据质量要求的提高,缺失数据相关问题也越来越受到重视.文章主要论述了处理缺失数据的方法之一——分数插补法的理论基础,并在此基础上研究了分数热卡插补法及其方差估计,同时使用模拟数据,对分数热卡插补法的实现过程做了模拟研究.通过对比实验,可以得到分数热卡插补法能够在保证原有数据分布的基础上,减少因插补造成的偏差,提供更加准确的插补结果.  相似文献   

18.
Empirical Likelihood-based Inference in Linear Models with Missing Data   总被引:18,自引:0,他引:18  
The missing response problem in linear regression is studied. An adjusted empirical likelihood approach to inference on the mean of the response variable is developed. A non-parametric version of Wilks's theorem for the adjusted empirical likelihood is proved, and the corresponding empirical likelihood confidence interval for the mean is constructed. With auxiliary information, an empirical likelihood-based estimator with asymptotic normality is defined and an adjusted empirical log-likelihood function with asymptotic χ2 is derived. A simulation study is conducted to compare the adjusted empirical likelihood methods and the normal approximation methods in terms of coverage accuracies and average lengths of the confidence intervals. Based on biases and standard errors, a comparison is also made between the empirical likelihood-based estimator and related estimators by simulation. Our simulation indicates that the adjusted empirical likelihood methods perform competitively and the use of auxiliary information provides improved inferences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号