共查询到17条相似文献,搜索用时 234 毫秒
1.
文章通过对缺失值处理方法分析,提出基于分类的三种缺失值处理方法:分类的均值插补法、分类的多重插补法和分类的K-means方法;该方法先对被调查对象问卷中的满意度关键字段按照分值进行分类,然后在同类中的缺失值用该类的平均值、多重插补值和聚类中心值替代.最后,以某食品公司为研究对象,对顾客满意度测评模型进行带缺失值的实证分析.结果表明:基于分类的三种缺失值处理方法优于均值插补法、多重插补法和K-means方法,为顾客满意度指数测评中的缺失值处理提供了实用方法. 相似文献
2.
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。 相似文献
3.
文章通过对国内外顾客满意度指数模型的对比及我国的现状分析,提出顾客满意度指数测评的拓展模型,模型对感知质量潜变量细化为感知产品质量和感知服务质量.在此基础上对带缺失值的顾客满意度指数测评步骤进行研究.针对该拓展模型,基于均值插补法,提出一种新的缺失值处理方法-分类均值插补法,该方法先对被调查对象问卷中的满意度字段按照分值进行分类,然后对同类中的缺失值用该类的平均值替代. 相似文献
4.
5.
缺失值是调查中普遍存在的问题,对缺失值进行插补是处理缺失值的较好方法.如果变量之间存在相关关系,可以通过正态线形模型利用不存在缺失值的变量对有存在缺失值的变量进行插补.较之单一插补,多重插补更能有效地估计总体方差,因此更多地被使用.文章借助Bootstrap法,让模型的参数和残差来自完全观测的Bootstrap样本的最小平法估计,可进一步准确估计总体方差.通过大量模拟试验,发现Bootstrap多重插补较之单一插补和一般多重插补能构建更宽的置信区间从而有更准确的总体参数覆盖率,这点在数据缺失比重很大时优势更明显. 相似文献
6.
7.
本文介绍一种新的市场调查思路———分割问卷设计法,它是解决市场调查中问卷过长问题的一种有效方法。常用的分割问卷设计方法是将原始调查问卷分割为一个核心部分和几个次要部分,每一个被调查者只需回答核心部分的问题和一个随机分配的次要部分的问题。这种问卷设计方法,可以减小被调查者的负担,提高被调查者的合作意愿和调查结果的质量。分割问卷设计方法一般与缺失数据插补方法结合使用。 相似文献
8.
9.
市场调查中为了实现成本效益、准确性以及速度目标往往会采用较长问卷,本文首先分析了因此可能对数据质量产生的影响,在此基础上提出了分割问卷的思想。重点阐述其设计要点,以及如何利用多重插补方法对缺失数据进行处理。 相似文献
10.
11.
Choukri Mohamed Stephen A. Sedory 《Journal of Statistical Computation and Simulation》2018,88(7):1273-1294
In this paper, we introduce a fresh methodology for imputing missing values by making use of sensible constraints on both a study variable and auxiliary variables that are correlated with the variable of interest. The resultant estimator based on these imputed values is shown to lead to the regression type method of imputation in survey sampling. Furthermore, when the data are hybrid of both that missing at random and missing complexly at random, the resultant estimator is shown to be a consistent estimator that has asymptotic mean squared error equal to that of the linear regression method of imputation. A generalization to any type of method of imputation is possible and has been included at the end. 相似文献
12.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献
13.
Luise Patricia Lago Robert Graham Clark 《Australian & New Zealand Journal of Statistics》2015,57(2):169-187
Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra‐household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness. 相似文献
14.
The effects of income imputation on microanalyses: evidence from the European Community Household Panel 总被引:1,自引:0,他引:1
Cheti Nicoletti Franco Peracchi 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):625-646
Summary. Social surveys are usually affected by item and unit non-response. Since it is unlikely that a sample of respondents is a random sample, social scientists should take the missing data problem into account in their empirical analyses. Typically, survey methodologists try to simplify the work of data users by 'completing' the data, filling the missing variables through imputation. The aim of the paper is to give data users some guidelines on how to assess the effects of imputation on their microlevel analyses. We focus attention on the potential bias that is caused by imputation in the analysis of income variables, using the European Community Household Panel as an illustration. 相似文献
15.
The recently developed rolling year GEKS procedure makes maximum use of all matches in the data to construct nonrevisable price indexes that are approximately free from chain drift. A potential weakness is that unmatched items are ignored. In this article we use imputation Törnqvist price indexes as inputs into the rolling year GEKS procedure. These indexes account for quality changes by imputing the “missing prices” associated with new and disappearing items. Three imputation methods are discussed. The first method makes explicit imputations using a hedonic regression model which is estimated for each time period. The other two methods make implicit imputations; they are based on time dummy hedonic and time-product dummy regression models and are estimated on bilateral pooled data. We present empirical evidence for New Zealand from scanner data on eight consumer electronics products and find that accounting for quality change can make a substantial difference. 相似文献
16.
Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献
17.
《The American statistician》2013,67(3):251-256
Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol. 相似文献