首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
文章从统计学的角度介绍了社科研究建模中统计数据的主要来源和数据准备的一些基本方法,无论宏观数据还是微观数据,在建模之前都需要经过充分的准备和初步处理才可以应用。文章认为,样本充足、分布优良和均衡可比的统计数据是成功建模的关键,即好的数据等于建模成功一半。  相似文献   

2.
公共突发事件应急统计中纵向缺失数据的处理方法研究   总被引:1,自引:0,他引:1  
缺失数据在公共突发事件的应急统计数据分析中是一个非常普遍的问题,针对公共突发事件应急统计数据的纵向数据集,提出用一种得分匹配法来进行缺失值的借补处理,并将其与另外三种缺失值处理方法进行比较,即构造各种不同缺失率的随机缺失数据集,分别运用得分匹配法、LVCF借补法、无条件均值抽取法和多重借补法四种不同的缺失值处理方法对每一种缺失率的数据集缺失值进行处理。统计分析结果表明,少数缺失值发生时,LVCF法简单而有效;随着缺失率的增加,均值抽取法和多重借补法处理效果更稳定;得分匹配法借补缺失值考虑了变量之间的相关性,最大程度地利用了数据集包含的信息,同时考虑了含缺失值变量的实际变异程度,因此取得了最好的借补效果。  相似文献   

3.
叶榕 《浙江统计》2001,(5):30-31
随着市场经济的进一步完善 ,属于定量分析的统计方法在制定营销策略时显得越来越重要 ,但实践过程中往往遇到的是定性数据。本文结合几个实际例子探讨一下这类数据的统计处理方法。一、两个基本问题1.统计数据的划分。按照美国统计学家史蒂文斯的划分 ,统计数据分为定名、定序、定距和定比四种类型 ,层次从低到高 ,运算可从计数、排序到加减、乘除。一般的企业营销活动统计到的销售量、销售额等属于定比数据。而如消费者性别、教育程度、年龄等 ,则分别为定名、定序和定距数据。不同类型的数据在运用统计方法时要进行不同的处理。2.数据…  相似文献   

4.
欧洲统计系统建立的数据质量评估框架从统计机构环境、统计程序和统计产出三个方面对统计数据质量展开评估,开发了数据质量报告标准、质量报告手册和自我评估检查单等系列数据质量管理工具.借鉴欧盟的经验,我国应建立数据质量评估框架,主要措施包括成立数据质量评估机构、制定国家数据质量保证框架模板、设定数据质量评估方法和完善数据质量评估报告等.  相似文献   

5.
刘兴远 《浙江统计》2013,(12):51-53
摘要:大数据环境下,官方统计数据发布的条件和环境发生重大变化,统计数据发布方式、内容和时效及频率都将受到强力冲击。因此,政府统计部门在拥抱大数据、与大数据共舞中,应从打造数据仓库、运用可视化技术、整合数据资源、与新媒介共融等方面着眼,积极改进和完善官方统计数据发布模式。  相似文献   

6.
基于经典计量模型的统计数据质量评估方法   总被引:6,自引:2,他引:4       下载免费PDF全文
刘洪  黄燕 《统计研究》2009,26(3):91-96
 本文以经济理论为基础,从整个经济系统出发,利用研究对象的相关影响因素构造计量模型,在既定模型下,运用异常值的检验方法及统计诊断原理进行数据质量的定量评估。通过选择合适的模型对考察对象的变化规律进行模拟,找出异常数据(离群值),判断异常数据是否显著异常,对异常数据进行多方查证和原因分析来进一步判断数据的质量,并对我国的统计数据质量进行了实证分析。  相似文献   

7.
针对区域宏观经济统计数据质量问题,研究并制定了宏观经济统计数据质量诊断的技术方法,在提出相关研究的理论假定后,对宏观经济数据从静态、动态和多维、系统的角度构建了统计数据质量诊断模型,给出了定量诊断方法、诊断准则和模式,并运用实际数据进行了模拟实验。  相似文献   

8.
朱一波 《浙江统计》2013,(12):20-22
摘要:本文通过分析数据质量的重要性以及研究背景和方法,从数据匹配性角度出发,并运用统计模型方法,对浙江宏观经济统计数据质量进行了实证研究。结果表明浙江宏观经济统计数据质量总体较好,在此基础上给出了提高统计数据质量和减少人民群众对统计数据质疑的建议。  相似文献   

9.
统计数据质量是统计工作的生命,统计数据质量又以数据的准确性为核心,因此保证统计数据的准确性是统计工作之根本要求。但近些年来,数据失真现象不同程度地存在。这严重影响了统计的信誉,应引起各级部门的高度重视。解决统计数据失真问题成了当前统计工作的重要问题之一。本文深入剖析造成统计数据失真的种种原因,并针对性地提出了相应的治理对策。  相似文献   

10.
统计数据是各级政府进行宏观决策的重要依据,真实可靠的统计数据利于政府做出正确决策,而不真实、不准确的统计数据必将引致错误决策,危及经济社会发展。因此,统计数据质量是统计工作的生命,如何保障统计数据的真实可靠,是统计部门要努力解决的首要问题。基层调查-数据审核-数据处理-汇总上报-数据评估-信息分析过程是统计工作的一般流程。从统计工作流程分析,基层统计数据质量将直接影响统计数据的真实性和统计信息、统计分析的可靠性。当前,影响基层统计数据质量的原因是多方面的。本文通过对影响基层统计数据质量的主要因素分析,提出提高基层统计数据质量的若干建议。  相似文献   

11.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

12.
We model the Alzheimer's disease-related phenotype response variables observed on irregular time points in longitudinal Genome-Wide Association Studies as sparse functional data and propose nonparametric test procedures to detect functional genotype effects while controlling the confounding effects of environmental covariates. Our new functional analysis of covariance tests are based on a seemingly unrelated kernel smoother, which takes into account the within-subject temporal correlations, and thus enjoy improved power over existing functional tests. We show that the proposed test combined with a uniformly consistent nonparametric covariance function estimator enjoys the Wilks phenomenon and is minimax most powerful. Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative database, where an application of the proposed test lead to the discovery of new genes that may be related to Alzheimer's disease.  相似文献   

13.
Data envelopment analysis models are used for measuring composite indicators in various areas. Although there are many models for measuring composite indicators in the literature, surprisingly, there is no methodology that clearly shows how composite indicators improvement could be performed. This article proposes a slack analysis framework for improving the composite indicator of inefficient entities. For doing so, two dual problems originated from two data envelopment analysis models in the literature are proposed, which can guide decision makers on how to adjust the subindicators of inefficient entities to improve their composite indicators through identifying which subindicators must be improved and how much they should be augmented. The proposed methodology for improving composite indicators is inspired from data envelopment analysis and slack analysis approaches. Applicability of the proposed methodology is investigated for improving two well-known composite indicators, i.e., Sustainable Energy Index and Human Development Index. The results show that 12 out of 18 economies are inefficient in the context of sustainable energy index, for which the proposed slack analysis models provide the suggested adjustments in terms of their respected subindicators. Furthermore, the proposed methodology suggests how to adjust the life expectancy, the education, and the gross domestic product (GDP) as the three socioeconomic indicators to improve the human development index of 24 countries which are identified as inefficient entities among 27 countries.  相似文献   

14.
We analyse longitudinal data on CD4 cell counts from patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. The investigators were interested in modelling the CD4 cell count as a function of treatment, age at base-line and disease stage at base-line. Serious concerns can be raised about the normality assumption of CD4 cell counts that is implicit in many methods and therefore an analysis may have to start with a transformation. Instead of assuming that we know the transformation (e.g. logarithmic) that makes the outcome normal and linearly related to the covariates, we estimate the transformation, by using maximum likelihood, within the Box–Cox family. There has been considerable work on the Box–Cox transformation for univariate regression models. Here, we discuss the Box–Cox transformation for longitudinal regression models when the outcome can be missing over time, and we also implement a maximization method for the likelihood, assumming that the missing data are missing at random.  相似文献   

15.
Shapes of service-time distributions in queueing network models have a great impact on the distribution of system response-times. It is essential for the analysis of response-time distribution that the modeled service-time distributions have the correct shape. Tradionally modeling of service-time distributions is based on a parametric approach by assuming a specific distribution and estimating its parameters. We introduce an alternative approach based on the principles of exploratory data analysis and nonparametric data modeling. The proposed method applies nonlinear data transformation and resistant curve fitting. The method can be used in cases, where the available data is a complete sample, a histogram, or the mean and a set of 5-10 quantiles. The reported results indicate that the proposed method is able to approximate the distribution of measured service times so that accurate estimates for quantiles of the response-time distribution are obtained  相似文献   

16.
Data Sharpening for Hazard Rate Estimation   总被引:1,自引:0,他引:1  
Data sharpening is a general tool for enhancing the performance of statistical estimators, by altering the data before substituting them into conventional methods. In one of the simplest forms of data sharpening, available for curve estimation, an explicit empirical transformation is used to alter the data. The attraction of this approach is diminished, however, if the formula has to be altered for each different application. For example, one could expect the formula for use in hazard rate estimation to differ from that for straight density estimation, since a hazard rate is a ratio–type functional of a density. This paper shows that, in fact, identical data transformations can be used in each case, regardless of whether the data involve censoring. This dramatically simplifies the application of data sharpening to problems involving hazard rate estimation, and makes data sharpening attractive.  相似文献   

17.
Doubly censored failure time data occur in many areas including demographical studies, epidemiology studies, medical studies and tumorigenicity experiments, and correspondingly some inference procedures have been developed in the literature (Biometrika, 91, 2004, 277; Comput. Statist. Data Anal., 57, 2013, 41; J. Comput. Graph. Statist., 13, 2004, 123). In this paper, we discuss regression analysis of such data under a class of flexible semiparametric transformation models, which includes some commonly used models for doubly censored data as special cases. For inference, the non‐parametric maximum likelihood estimation will be developed and in particular, we will present a novel expectation–maximization algorithm with the use of subject‐specific independent Poisson variables. In addition, the asymptotic properties of the proposed estimators are established and an extensive simulation study suggests that the proposed methodology works well for practical situations. The method is applied to an AIDS study.  相似文献   

18.
The analysis of non stationary data streams requires a continuous adaption of the model to the relevant most recent data. This requires that changes in the data stream must be distinguished from noise. Many approaches are based on heuristic adaptation schemes. We analyze simple regression models to understand the joint effects of noise and concept drift and derive the optimal sliding window size for the regression models. Our theoretical analysis and simulations show that a near optimal window size can be crucial. Our models can be used as benchmarks for other models to see how they cope with noise and drift.  相似文献   

19.
Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol.  相似文献   

20.
The Points to Consider Document on Missing Data was adopted by the Committee of Health and Medicinal Products (CHMP) in December 2001. In September 2007 the CHMP issued a recommendation to review the document, with particular emphasis on summarizing and critically appraising the pattern of drop‐outs, explaining the role and limitations of the ‘last observation carried forward’ method and describing the CHMP's cautionary stance on the use of mixed models. In preparation for the release of the updated guidance document, statisticians in the Pharmaceutical Industry held a one‐day expert group meeting in September 2008. Topics that were debated included minimizing the extent of missing data and understanding the missing data mechanism, defining the principles for handling missing data and understanding the assumptions underlying different analysis methods. A clear message from the meeting was that at present, biostatisticians tend only to react to missing data. Limited pro‐active planning is undertaken when designing clinical trials. Missing data mechanisms for a trial need to be considered during the planning phase and the impact on the objectives assessed. Another area for improvement is in the understanding of the pattern of missing data observed during a trial and thus the missing data mechanism via the plotting of data; for example, use of Kaplan–Meier curves looking at time to withdrawal. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号