首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
关于建立统计数据质量文档的思考   总被引:2,自引:0,他引:2  
一、建立统计数据质量文档的意义“质量文档(QualityProfile)”又称为“误差文裆(Er-rorProfile),是70年代末、80年代初田对统计数据误差实施分类控制的思想逐渐发展起来的,是根据统计数据生产过程各方面、各阶段的误差影啊因素、控制措施、控制结果建立起来的一种描述和反映文件。其目的旨在确认各种非抽样误差的主要来源、过程控制的实际水平以及尽可能地量化误差的组成部分,并检查各种误差对统计数据最终结果的影响。“一项完善的质量文档不仅要把调查过程以及潜在的误差来源罗列出来.而巨要能路列出全部的调查误差申留个阶段…  相似文献   

2.
处理缺失数据中辅助信息的利用   总被引:2,自引:0,他引:2       下载免费PDF全文
金勇进 《统计研究》1998,15(1):43-45
统计分析中经常会遇到数据缺失的情况。数据缺失的产生背景不同,主要来自于调查中的无回答。此外,由于调查员的疏忽,在调查过程中遗漏了某些调查项,或在对调查数据的检查与处理过程中,发现一些不合逻辑,明显有误,或有意使假的数据,而将其剔除,这些都会造成数据缺失。 缺失数据造成的危害是明显的,它不仅使接受调查的实际单位数目减少,扩大了抽样调查中的估计量方差,而且还会导致估计量偏差,是影响统计数据质量的重要方面。一般而言,对于缺失数据,往往需要进行重新调查,以便将缺失的数据补齐。但有时由于种种原因和条件的限制,或者无法进行重新的补充调查,或者这种补充调查仍然不能解决问题。这时,我们特别关心两个问题:一是需要了解缺失数据造成的影响有多大,即能否对由于数据缺失带来的估计量偏差进行估计;二是如何对缺失数据进行补救。这两个问题都与辅助信息有关,本文拟就这些问题进行分析。  相似文献   

3.
无回答误差是指在调查中由于各种原因,调查人员没能够从入选样本的单元处获得需要的信息,由于数据缺失造成估计量的偏误,这种偏误对调查数据的质量起着重要的影响。而这种现象在抽样调查中普遍存在,对估计量的危害也比较大,所以国际上对这方面的讨论一直比较热烈且对此问题的研究比较系统,而国内在这方面的研究尚不充分和完善,抽样调查实践中更是缺乏对无回答的有效控制手段。  相似文献   

4.
程晞  李磊 《统计研究》1990,7(5):41-44
反映统计数字质量的最重要特征是其准确性。对统计数字质量进行评价,主要目的是度量统计数字的准确性,识别出影响数字准确性的误差来源。一般说来,人口统计数字的误差有两种,一种是由于调查对象的属性、特征发生错误造成的误差,如误报出生年月;另一种是由于调查过程中的登记遗漏、重复,或将不属于调查对象的人口归入调查总体造成的误差。本文主要以人口普查和1987年1%人口抽样调查资料为背景,从数据使用者的角度,说明评价主要人口统计指标准确性的方法。  相似文献   

5.
为了提高统计数据的质量,国家统计局已确定规模以下工业统计最终要以抽样调查替代传统的全面调查。如何从多方面创造条件为抽样调查工作保驾护航以提高统计数据真实性,是进行规模以下工业抽样调查时必须解决的问题。文章提出以下观点:第一,领导的重视、各部门的支持是搞好规模以下工业抽样调查的关键。给调查部门足够的人力、财力支持,包括调查机构的组建、专业人员的调配等等。第二,给被调查者一个宽松环境。只有给被调查者一个宽松环境,包括心理上与组织上的支持,使被调查者能全力、全心投入,提供真实可信的原始资料才能使科学的方法有一…  相似文献   

6.
统计数据是我国进行体制改革、宏观调控、微观搞活的重要依据,因而统计数据的质量便是影响这些决策正确与否的重要因江。评价统计数据质量一般以准确性、及时性、全面性、适用性为标准,其中尤以准确性为核心,它是统计数据质量最基本的一个方面。提高统计数据质量包括提高统计设计、统计调查、统计分析等各个环节的质量。笔者试分析我国统计数据质量中存在的问题及解决的对策。统计数据质量中存在的问题分析目前,我国统计数据质量中存在的问题主要是统计数据的准确性与全面性问题。(一)统计数据准确性问题分析1.行政干预统计问题行政…  相似文献   

7.
以"RS、GIS、GPS"为代表的3S技术目前在农业统计中已经开始探索应用,其中,利用卫星影像、无人机航拍、PDA终端、以及空间信息技术等正在对农产量调查进行先期试点工作。而畜牧业作为大农业的一部分,其调查过程中也应该逐步实现对3S技术的有效利用。主要畜禽监测作为畜牧业统计的常规调查,其数据一直备受各级党政领导的关注,生猪监测调查数据已经作为国家法定调查数据公布。为增强畜牧业统计数据的直观理解,本文探讨了利用地理信息系统与统计数据的结合方法,将主要畜禽监测数据区域显示在地图上,阐述不同要素下的数据可视化表达。  相似文献   

8.
统计数据是各级政府进行宏观决策的重要依据,真实可靠的统计数据利于政府做出正确决策,而不真实、不准确的统计数据必将引致错误决策,危及经济社会发展。因此,统计数据质量是统计工作的生命,如何保障统计数据的真实可靠,是统计部门要努力解决的首要问题。基层调查-数据审核-数据处理-汇总上报-数据评估-信息分析过程是统计工作的一般流程。从统计工作流程分析,基层统计数据质量将直接影响统计数据的真实性和统计信息、统计分析的可靠性。当前,影响基层统计数据质量的原因是多方面的。本文通过对影响基层统计数据质量的主要因素分析,提出提高基层统计数据质量的若干建议。  相似文献   

9.
小型调查单位(本文将规模以下、限额以下等统计调查单位统称为小型调查单位)个体经济规模小,但数量庞大,是国民经济的重要组成部分,其统计数据是通过抽样调查的方式获得的,因此搞准单个统计样本的统计数据是十分重要的。当然,提高小型调查单位统计数据质量的方法是多样的,其中通过深入调查单位了解情况,开展统计执法检查,是提高数据质量的一个有效的方法。本文主要从统计执法的角度,结合小型调查单位统计工作的特点,对如何做好统计执法,提高统计数据质量做一简要阐述。  相似文献   

10.
开展统计数据质量评估是提高统计数据质量、充分发挥统计数据作用的一个重要手段。从缙云县的目前情况看,农业统计中,总量经济统计汇总数据的评估已被各级领导和统计部门所重视,而乡镇及村级统计数据质量的评估是一个薄弱环节,与此相对应,汇总指标基本反映实际,但乡镇之间却参差不齐,统计数据质量有待进一步提高。一、开展乡镇统计数据质量评估的必要性当前农业统计数据质量受多方面的影响,但归纳起来主要有两方面:一是农村分户经营从客观上对搞准统计数字造成很大的困难;二是由于各级政府在抓农村经济工作时,往往把农业经济指标…  相似文献   

11.
This study compares empirical type I error and power of different permutation techniques that can be used for partial correlation analysis involving three data vectors and for partial Mantel tests. The partial Mantel test is a form of first-order partial correlation analysis involving three distance matrices which is widely used in such fields as population genetics, ecology, anthropology, psychometry and sociology. The methods compared are the following: (1) permute the objects in one of the vectors (or matrices); (2) permute the residuals of a null model; (3) correlate residualized vector 1 (or matrix A) to residualized vector 2 (or matrix B); permute one of the residualized vectors (or matrices); (4) permute the residuals of a full model. In the partial correlation study, the results were compared to those of the parametric t-test which provides a reference under normality. Simulations were carried out to measure the type I error and power of these permutatio methods, using normal and non-normal data, without and with an outlier. There were 10 000 simulations for each situation (100 000 when n = 5); 999 permutations were produced per test where permutations were used. The recommended testing procedures are the following:(a) In partial correlation analysis, most methods can be used most of the time. The parametric t-test should not be used with highly skewed data. Permutation of the raw data should be avoided only when highly skewed data are combined with outliers in the covariable. Methods implying permutation of residuals, which are known to only have asymptotically exact significance levels, should not be used when highly skewed data are combined with small sample size. (b) In partial Mantel tests, method 2 can always be used, except when highly skewed data are combined with small sample size. (c) With small sample sizes, one should carefully examine the data before partial correlation or partial Mantel analysis. For highly skewed data, permutation of the raw data has correct type I error in the absence of outliers. When highly skewed data are combined with outliers in the covariable vector or matrix, it is still recommended to use the permutation of raw data. (d) Method 3 should never be used.  相似文献   

12.
Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies, such as epidemiological studies and longitudinal clinical trials. Estimation approaches without any structural assumptions may lead to inadequate and numerically unstable estimators in practice. We propose in this paper a nonparametric approach based on time-varying parametric models for estimating the conditional distribution functions with a longitudinal sample. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model after local Box–Cox transformation. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Applications of our two-step estimation method have been demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through a simulation study. Application and simulation results show that smoothing estimation from time-variant parametric models outperforms the existing kernel smoothing estimator by producing narrower pointwise bootstrap confidence band and smaller root mean squared error.  相似文献   

13.
统计数据质量研究新思路——误差研究   总被引:3,自引:1,他引:2  
杨清 《统计研究》2000,17(8):29-32
统计数据质量问题一直困扰着我国统计界 ,也为社会各界所关注。统计数据真正的重要性在于影响人们对社会经济现象的认识和判断。对当前我国统计数据质量的状况 ,据权威的判断是“宏观统计数据基本可靠 ,从总体看 ,宏观统计数据客观地反映了经济运行的趋势和变化 ,未发生趋势性差错” ;“有些原始数据质量问题较多 ,一些地方原始数据① 国家统计局局长刘洪在全国统计工作会议上的讲话 ,1998 1 7。质量差 ,甚至有下降趋势。”①我国的统计数据质量问题一直存在 ,统计界的行政领导也一直非常重视 ,理论上的研究探讨也始终保持热点 ,但研究的重…  相似文献   

14.
This paper proposes a new test for the error cross-sectional uncorrelatedness in a two-way error components panel data model based on large panel data sets. By virtue of an existing statistic under the raw data circumstance, an analogous test statistic using the within residuals of the model is constructed. We show that the resulting statistic needs bias correction to make valid inference, and then propose a method to implement feasible correction. Simulation shows that the test based on the feasible bias-corrected statistic performs well. Additionally, we employ a real data set to illustrate the use of the new test.  相似文献   

15.
We compare the accuracy of five approaches for contour detection in speckled imagery. Some of these methods take advantage of the statistical properties of speckled data, and all of them employ active contours using B-spline curves. Images obtained with coherent illumination are affected by a noise called speckle, which is inherent to the imaging process. These data have been statistically modeled by a multiplicative model using the G0 distribution, under which regions with different degrees of roughness can be characterized by the value of a parameter. We use this information to find boundaries between regions with different textures. We propose and compare five strategies for boundary detection: three based on the data (maximum discontinuity on raw data, fractal dimension and maximum likelihood) and two based on estimates of the roughness parameter (maximum discontinuity and anisotropic smoothed roughness estimates). In order to compare these strategies, a Monte Carlo experience was performed to assess the accuracy of fitting a curve to a region. The probability of finding the correct edge with less than a specified error is estimated and used to compare the techniques. The two best procedures are then compared in terms of their computational cost and, finally, we show that the maximum likelihood approach on the raw data using the G0 law is the best technique.  相似文献   

16.
The ratio is a familiar statistic, but it is often misused. One frequently overlooked problem occurs when ratioing two discrete (digitized) variables. Fine structure appears in the histogram of the ratio that can be very subtle, or can sometimes even dominate the histogram. It disappears when the numerator and/or denominator become continuous. This statistical artifact is not a binning error, nor is it removed by taking more data. It is important to be aware of the artifact in order to avoid misinterpretation of ratio data. We provide examples of the statistical artifact (including one from baseball) and discuss ways to avoid or minimize the problems it can cause.  相似文献   

17.
The receiver operating characteristic (ROC) curve can be used to evaluate the properties of a diagnostic test from the distribution of a variable on the healthy and diseased populations. The minimum averaged mean squared error (MAMSE) weights were developed to handle data from different sources by adjusting the relative contribution of each data source. The authors use the MAMSE weights to infer the ROC curve of a diagnostic test based on raw data from multiple studies. The proposed estimates are consistent and Monte Carlo simulations show favourable finite sample performance. The method is illustrated in a case study where progesterone level is used to detect ectopic pregnancies and abortions from other natural causes. The Canadian Journal of Statistics 46: 298–315; 2018 © 2018 Statistical Society of Canada  相似文献   

18.
Principal component and correspondence analysis can both be used as exploratory methods for representing multivariate data in two dimensions. Circumstances under which the, possibly inappropriate, application of principal components to untransformed compositional data approximates to a correspondence analysis of the raw data are noted. Aitchison (1986) has proposed a method for the principal component analysis of compositional data involving transformation of the raw data. It is shown how this can be approximated by a correspondence analysis of appropriately transformed data. The latter approach may be preferable when there are zeroes in the data.  相似文献   

19.
Retrospectively collected duration data are often reported incorrectly. An important type of such an error is heaping—respondents tend to round-off or round-up the data according to some rule of thumb. For two special cases of the Weibull model we study the behaviour of the ‘naive estimators’, which simply ignore the measurement error due to heaping, and derive closed expressions for the asymptotic bias. These results give a formal justification of empirical evidence and simulation-based findings reported in the literature. Additionally, situations where a remarkable bias has to be expected can be identified, and an exact bias correction can be performed.  相似文献   

20.
In a former study (Chatillon, Gelinas, Martin and Laurencelle, 1987), the authors arrived at the conclusion that for small to moderate sample sizes (n≦90), and for population distributions that are not too skewed nor heavy tailed, the percentiles computed from a set of 9 classes are at least as precise as the corresponding percentiles computed with raw data. Their proof was based essentially on Monte Carlo simulations. The present paper gives a different and complementary proof, based on an exact evaluation of the mean squared error. The method of proof uses the trinomial distribution in an interesting way.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号