首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
无回答在大数据应用中频繁发生。通常,实际数据的无回答率较低,在这样的情况下,采用倾向得分模型对无回答单元与回答单元进行匹配,易导致倾向得分匹配插补法的插补效果显著下降。为此,将合成少数类过采样算法的思想融入到倾向得分匹配插补法中,提出基于少数类过采样的倾向得分匹配插补法。利用统计模拟与实证研究,在不同无回答率、插补重数和误差分布情形下,演示新插补法的统计性质和应用效果。统计模拟显示,新插补法具有明显高于倾向得分匹配插补法的精度,统计性质受无回答率、插补重数和误差分布的影响小。实证结果显示,新插补法在实际数据中具有较好的应用性。基于少数类过采样的倾向得分匹配插补法提供了处理无回答问题的新思路,并具有较好的扩展性。  相似文献   

2.
调查数据无回答在抽样调查中经常出现.无回答项目插补法是处理无回答的最主要方法之一,而辅助变量对提高插补值准确度非常重要.因此,研究调查数据无回答项目的高相关性辅助变量择优回归插补法,先筛选与目标变量间相关系数高的辅助变量,再建立回归插补模型.该方法的辅助变量选择过程简单,插补值准确性高.模拟例子演示了该方法的优良性.  相似文献   

3.
于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

4.
随着研究中对数据质量要求的提高,缺失数据相关问题也越来越受到重视.文章主要论述了处理缺失数据的方法之一——分数插补法的理论基础,并在此基础上研究了分数热卡插补法及其方差估计,同时使用模拟数据,对分数热卡插补法的实现过程做了模拟研究.通过对比实验,可以得到分数热卡插补法能够在保证原有数据分布的基础上,减少因插补造成的偏差,提供更加准确的插补结果.  相似文献   

5.
抽样调查工作中无回答情形不可避免,双重抽样框下亦如此,因此需要对双重抽样框下抽样调查项目无回答造成的估计量偏差进行纠偏校正。首先通过二重抽样获取辅助变量的信息,使用其构造比率估计量与比率型指数估计量的组合估计量对双重抽样框下抽样调查中项目无回答数据进行插补,得到对应各部分子总体的均值估计,再用Hartley估计量的形式对总体总值进行估计。通过计算估计量偏差、均方误差及最优权重系数,对比相同条件下完全回答时同类型组合估计量均方误差的相对精度损失与使用单一比率型指数估计量的相对精度损失,随机模拟结果显示损失率较低,插补方法有效。选择合适的辅助变量构造比率估计量和比率型指数估计量的组合估计量做插补值,更充分利用辅助变量和已回答研究变量信息,基于提出的组合估计量于抽样调查工作具有一定的应用价值。  相似文献   

6.
分层随机抽样条件下缺失数据的多重插补方法   总被引:1,自引:0,他引:1  
介绍分层随机抽样条件下多重插补法处理缺失数据的基本思想,分析可忽略无回答的分层随机抽样建立多重插补的常用方法,并通过实例加以说明.  相似文献   

7.
针对长问卷存在无回答率和回答负担,从而导致统计调查精度降低的问题,采用问卷分割法解决该问题,并且通过小域估计的方法进行参数估计。模拟研究表明,利用小域估计方法对分割问卷进行参数估计显然优于用多重插补法进行参数估计。研究结果表明,运用小域估计方法对分割问卷进行参数估计,能显著提高统计调查的精度。  相似文献   

8.
加权调整属于处理无回答的事后调整方法,文章列举了权数调整中计算回答率的两种方式,并从回答率入手,利用统计模拟的方法找出了在MCAR和CDM数据缺失机制下,回答率的不同计算方法和不同取值对估计效果的影响。  相似文献   

9.
研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的.  相似文献   

10.
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。  相似文献   

11.
Summary.  Social surveys are usually affected by item and unit non-response. Since it is unlikely that a sample of respondents is a random sample, social scientists should take the missing data problem into account in their empirical analyses. Typically, survey methodologists try to simplify the work of data users by 'completing' the data, filling the missing variables through imputation. The aim of the paper is to give data users some guidelines on how to assess the effects of imputation on their microlevel analyses. We focus attention on the potential bias that is caused by imputation in the analysis of income variables, using the European Community Household Panel as an illustration.  相似文献   

12.
Marginal imputation, that consists of imputing items separately, generally leads to biased estimators of bivariate parameters such as finite population coefficients of correlation. To overcome this problem, two main approaches have been considered in the literature: the first consists of using customary imputation methods such as random hot‐deck imputation and adjusting for the bias at the estimation stage. This approach was studied in Skinner & Rao 2002 . In this paper, we extend the results of Skinner & Rao 2002 to the case of arbitrary sampling designs and three variants of random hot‐deck imputation. The second approach consists of using an imputation method, which preserves the relationship between variables. Shao & Wang 2002 proposed a joint random regression imputation procedure that succeeds in preserving the relationships between two study variables. One drawback of the Shao–Wang procedure is that it suffers from an additional variability (called the imputation variance) due to the random selection of residuals, resulting in potentially inefficient estimators. Following Chauvet, Deville, & Haziza 2011 , we propose a fully efficient version of the Shao–Wang procedure that preserves the relationship between two study variables, while virtually eliminating the imputation variance. Results of a simulation study support our findings. An application using data from the Workplace and Employees Survey is also presented. The Canadian Journal of Statistics 40: 124–149; 2012 © 2011 Statistical Society of Canada  相似文献   

13.
Sarjinder Singh 《Statistics》2013,47(5):499-511
In this paper, an alternative estimator of population mean in the presence of non-response has been suggested which comes in the form of Walsh's estimator. The estimator of mean obtained from the proposed technique remains better than the estimators obtained from ratio or mean methods of imputation. The mean-squared error (MSE) of the resultant estimator is less than that of the estimator obtained on the basis of ratio method of imputation for the optimum choice of parameters. An estimator for estimating a parameter involved in the process of new method of imputation has been discussed. A suggestion to form ‘warm deck’ method of imputation has been suggested. The MSE expressions for the proposed estimators have been derived analytically and compared empirically. The work has been extended to the case of multi-auxiliary information to be used for imputation. Numerical illustrations are also provided.  相似文献   

14.
In this paper, a new power transformation estimator of population mean in the presence of non-response has been suggested. The estimator of mean obtained from proposed technique remains better than the estimators obtained from ratio or mean methods of imputation. The mean squared error of the resultant estimator is less than that of the estimator obtained on the basis of ratio method of imputation for the optinum choice of parameters. An estimator for estimating a parameter involved in the process of new method of imputation has been discussed. The MSE expressions for the proposed estimators have been derived analytically and compared empirically. Product method of imputation for negatively correlated variables has also been introduced. The work has been extended to the case of multi-auxiliary information to be used for imputation.  相似文献   

15.
The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

16.
Most of the research work in the theory of survey sampling only deals with the sampling errors under the assumptions: (i) there is a complete response and (ii) recorded information from individuals is correct but in practice it is not always true. Non-sampling errors like non-response and measurement errors (MEs) mostly creep into the survey and become more influential for estimators than sampling errors. Considering this practical situation of non-response and MEs jointly, we proposed an optimum class of estimators for population mean under simple random sampling using conventional and non-conventional measures. Bias and mean square error of the proposed estimators are derived up to first degree of approximation. Moreover, a simulation study is conducted to assess the performance of new estimators which proves that proposed estimators are more efficient than the traditional Hansen and Hurwitz estimator and other competing estimators.  相似文献   

17.
Summary.  The paper studies the non-response process in a long-term study of neurotic dis-order by comparing the analysis based on the responses that were collected by the established practice of interviewing the subjects, at dates arranged in advance (appointments), with the analysis of the nearly complete set of responses that were collected by an extensive effort that involved attempts to interview without seeking a prior agreement. The method of multiple imputation is applied, and its properties are explored in a setting that is not perfectly suited for its application: a relatively small sample size, ordinal score outcomes and the likelihood that the outcomes are missing not at random.  相似文献   

18.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

19.
We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit. Using the systematic approach, the sampling probabilities and the participation probabilities can be considered separately. This is beneficial when the performance of missing-data methods are to be compared. Using data from Health 2000 and 2011 Surveys and from national registries, it was found that multiple imputation removed almost all differences between full sample and estimated prevalences. The inverse probability weighting removed more than half and the doubly robust method 60% of the differences. These findings are encouraging since decreasing participation rates are a major problem in population surveys worldwide.  相似文献   

20.
This paper presents a modified exponential type estimation strategy for the current population mean in the presence of random non-response situations in two-occasion successive sampling under two-phase set-up. The properties of the proposed estimators have been examined with the assumption that numbers of sampling units follow a distribution due to random non-response. The performances of the proposed estimators are compared with the estimators designated for the complete response situations. Empirical studies are carried out to show the dominance nature of the proposed estimators over the estimator defined for complete response situations. Appropriate recommendations have been made to the survey practitioners/researchers for their real-life practical applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号