首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 234 毫秒
1.
文章通过对缺失值处理方法分析,提出基于分类的三种缺失值处理方法:分类的均值插补法、分类的多重插补法和分类的K-means方法;该方法先对被调查对象问卷中的满意度关键字段按照分值进行分类,然后在同类中的缺失值用该类的平均值、多重插补值和聚类中心值替代.最后,以某食品公司为研究对象,对顾客满意度测评模型进行带缺失值的实证分析.结果表明:基于分类的三种缺失值处理方法优于均值插补法、多重插补法和K-means方法,为顾客满意度指数测评中的缺失值处理提供了实用方法.  相似文献   

2.
于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

3.
文章通过对国内外顾客满意度指数模型的对比及我国的现状分析,提出顾客满意度指数测评的拓展模型,模型对感知质量潜变量细化为感知产品质量和感知服务质量.在此基础上对带缺失值的顾客满意度指数测评步骤进行研究.针对该拓展模型,基于均值插补法,提出一种新的缺失值处理方法-分类均值插补法,该方法先对被调查对象问卷中的满意度字段按照分值进行分类,然后对同类中的缺失值用该类的平均值替代.  相似文献   

4.
研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的.  相似文献   

5.
缺失值是调查中普遍存在的问题,对缺失值进行插补是处理缺失值的较好方法.如果变量之间存在相关关系,可以通过正态线形模型利用不存在缺失值的变量对有存在缺失值的变量进行插补.较之单一插补,多重插补更能有效地估计总体方差,因此更多地被使用.文章借助Bootstrap法,让模型的参数和残差来自完全观测的Bootstrap样本的最小平法估计,可进一步准确估计总体方差.通过大量模拟试验,发现Bootstrap多重插补较之单一插补和一般多重插补能构建更宽的置信区间从而有更准确的总体参数覆盖率,这点在数据缺失比重很大时优势更明显.  相似文献   

6.
为了研究缺失偏态数据下的联合位置与尺度模型,基于分布自身的特点,提出了一种适合缺失偏态数据下联合建模的插补方法———修正随机回归插补方法,该方法对缺失数据下模型偏度参数的调整十分显著。通过随机模拟和实例研究,并与回归插补和随机回归插补方法进行比较,结果表明,所提出的修正随机回归插补方法是有用和有效的。  相似文献   

7.
本文介绍一种新的市场调查思路———分割问卷设计法,它是解决市场调查中问卷过长问题的一种有效方法。常用的分割问卷设计方法是将原始调查问卷分割为一个核心部分和几个次要部分,每一个被调查者只需回答核心部分的问题和一个随机分配的次要部分的问题。这种问卷设计方法,可以减小被调查者的负担,提高被调查者的合作意愿和调查结果的质量。分割问卷设计方法一般与缺失数据插补方法结合使用。  相似文献   

8.
文章在响应变量随机缺失下研究非线性均值方差模型的参数估计问题.基于回归插补和随机回归插补两种缺失插补方法以及结合Gauss-Newton迭代计算算法给出该模型中未知参数的极大似然估计.并通过对两个随机模拟例子实际例子的研究分析,结果都表明了所提出的模型与统计方法具有可行性和实用性.  相似文献   

9.
市场调查中为了实现成本效益、准确性以及速度目标往往会采用较长问卷,本文首先分析了因此可能对数据质量产生的影响,在此基础上提出了分割问卷的思想。重点阐述其设计要点,以及如何利用多重插补方法对缺失数据进行处理。  相似文献   

10.
文章在响应变量随机缺失下,基于分位数回归研究了半参数模型的稳健估计问题。首先基于B样条基函数近似技术,将模型非参数函数的估计问题转化为样条系数向量估计问题;其次,在响应变量随机缺失下,提出了一种新的插补方法,对缺失的响应变量进行多重插补;再次,基于插补后的数据集,构造出新的分位数目标函数,得到模型非参数函数以及参数向量的稳健估计;最后给出了有效算法计算多重插补估计量。通过模拟研究验证了所提方法的有效性和稳健性。  相似文献   

11.
In this paper, we introduce a fresh methodology for imputing missing values by making use of sensible constraints on both a study variable and auxiliary variables that are correlated with the variable of interest. The resultant estimator based on these imputed values is shown to lead to the regression type method of imputation in survey sampling. Furthermore, when the data are hybrid of both that missing at random and missing complexly at random, the resultant estimator is shown to be a consistent estimator that has asymptotic mean squared error equal to that of the linear regression method of imputation. A generalization to any type of method of imputation is possible and has been included at the end.  相似文献   

12.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

13.
Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra‐household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness.  相似文献   

14.
Summary.  Social surveys are usually affected by item and unit non-response. Since it is unlikely that a sample of respondents is a random sample, social scientists should take the missing data problem into account in their empirical analyses. Typically, survey methodologists try to simplify the work of data users by 'completing' the data, filling the missing variables through imputation. The aim of the paper is to give data users some guidelines on how to assess the effects of imputation on their microlevel analyses. We focus attention on the potential bias that is caused by imputation in the analysis of income variables, using the European Community Household Panel as an illustration.  相似文献   

15.
The recently developed rolling year GEKS procedure makes maximum use of all matches in the data to construct nonrevisable price indexes that are approximately free from chain drift. A potential weakness is that unmatched items are ignored. In this article we use imputation Törnqvist price indexes as inputs into the rolling year GEKS procedure. These indexes account for quality changes by imputing the “missing prices” associated with new and disappearing items. Three imputation methods are discussed. The first method makes explicit imputations using a hedonic regression model which is estimated for each time period. The other two methods make implicit imputations; they are based on time dummy hedonic and time-product dummy regression models and are estimated on bilateral pooled data. We present empirical evidence for New Zealand from scanner data on eight consumer electronics products and find that accounting for quality change can make a substantial difference.  相似文献   

16.
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms.  相似文献   

17.
Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号