首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
基于稳健主成分回归的统计数据可靠性评估方法   总被引:1,自引:0,他引:1       下载免费PDF全文
 稳健主成分回归(RPCR)是稳健主成分分析和稳健回归分析结合使用的一种方法,本文首次运用稳健的RPCR及异常值诊断方法,对2008年我国地区经济增长横截面数据可靠性做了评估。评估结果表明:稳健的RPCR方法能更好的克服异常值的影响,使估计结果更加可靠,并能有效的克服经典的主成分回归(CPCR)方法容易出现的多个异常点的掩盖现象;基本可以认为2008年地区经济增长与相关指标数据是匹配的,但部分地区的经济增长数据可能存在可靠性问题。  相似文献   

2.
刘洪  金林 《统计研究》2012,29(10):99-104
本文以经济增长理论为基础,对1953-2010年中国GDP数据和劳动投入、资本投入、人力资本等因素建立了半参数回归模型。然后,文章对模型了进行了统计诊断分析,计算了相关统计诊断量,利用统计诊断量得到了模型的异常点,基于此对中国GDP数据的准确性进行了讨论:中国GDP数据的异常点主要集中两个时间段1958-1961年和1991-1994年。文章最后对基于半参数回归模型统计诊断的统计数据准确性评估方法进行了评述。  相似文献   

3.
刘洪  黄燕 《统计研究》2007,24(8):17-21
 本文采用组合模型的形式对时间序列数据的变化特点建模,在模型通过各种检验、具有良好统计预测功能的基础上,从检验异常值的角度来分析预测值与实际值之间差异的程度,找出离群数据,利用数理统计中检验实验观测数据异常值的方法,对离群数据的误差进行统计上的显著检验,从而评估统计数据的质量。文章以我国国内生产总值(GDP)为研究对象,选取我国1978-2003年间的GDP作为样本,运用趋势模拟评估法来评估我国2004年国内生产总值的准确性。对我国经济指标的时间序列数据进行了实证分析。  相似文献   

4.
基于经典计量模型的统计数据质量评估方法   总被引:6,自引:2,他引:4       下载免费PDF全文
刘洪  黄燕 《统计研究》2009,26(3):91-96
 本文以经济理论为基础,从整个经济系统出发,利用研究对象的相关影响因素构造计量模型,在既定模型下,运用异常值的检验方法及统计诊断原理进行数据质量的定量评估。通过选择合适的模型对考察对象的变化规律进行模拟,找出异常数据(离群值),判断异常数据是否显著异常,对异常数据进行多方查证和原因分析来进一步判断数据的质量,并对我国的统计数据质量进行了实证分析。  相似文献   

5.
本期导读     
统计数据的准确性是统计数据质量钓最重要特征.怎样去判断其准确性呢?作者在<组合模型对统计数据准确性检验的适用性研究>一文中,以我国GDP数据的准确性为例,根据Cramer分解定理的基本理论,运用异常值检验法对2001~2007年我国GDP数据的准确性进行了检验和分析,证明了组合预测模型在统计数据准确性检验中的适用性总体上不强,并提出了模型改进应用的思路,可供参考.  相似文献   

6.
为有效解决存在异常数据时传统Fisher判别分析(FDA)方法误判率高的问题,文章提出了一个简单而稳健的FDA方法.该方法首先用最小协方差行列式(MCD)稳健估计技术获得稳健的样本均值和协方差估计;然后再用FDA进行判别分析.为验证方法的有效性,文章将此方法应用于我国上市公司财务困境的预测问题.实证研究表明,在没有异常值的情况下,基于MCD的稳健FDA判别和传统的FDA判别结果基本一致;而在有异常值的情况下,新方法明显优于传统的FDA,不仅可有效抵御异常数据的干扰,而且仍保持较低的误判率.  相似文献   

7.
准确可靠的统计数据是把握经济运行情况、进行科学决策的基础.以我国GDP数据的准确性为例,选取1985~2010年间的数据作为样本,根据时间序列自身的变化特点,分别拟合灰色预测模型、回归组合模型和双指数平滑模型.在模型通过统计检验、具有良好统计预测能力的基础上,构建基于误差绝对值和最小的组合预测模型对我国GDP数据进行预测,所得预测值代表“真值”,再从异常值的角度对我国GDP数据的准确性进行分析,结果表明组合预测模型在统计数据准确性检验中较高的实用价值,值得进一步研究.  相似文献   

8.
在宏观经济统计数据中,GDP(国内生产总值)是一项重要的总量指标,故其数据质量也一直备受关注.文章以数据匹配性为立足点,选择了GDP这一宏观经济统计数据作为研究对象,分别从结构匹配性、空间匹配性、时间匹配性几个维度建立了统计数据质量评估模型,分别评估了各维度下的GDP数据质量,以求更加客观、真实地考察GDP统计数据质量,从而为宏观经济数据质量评估方法的进一步完善与改进提供一定的借鉴.  相似文献   

9.
异常数据的存在通常会导致经济数量分析得出有偏误的甚至是错误的结论,因此必须进行数据质量的诊断。本文介绍并研究了一种多维经济数据诊断的有效方法——"投影寻踪法",并以中国"GDP增长-消费增长"数据为例进行了实践检验,证明了它不但可以诊断出数据中的异常值点,而且可以充分保留和利用多维数据的结构性和相关性关系。  相似文献   

10.
一、引言  统计数据来源于基层调查单位,对基层调查单位上报的原始数据进行检查和审核是提高统计数据质量的重要措施之一。这种检查和审核应包括以下两方面:(1)对基层调查单位上报的原始数据进行质量评估。通过评估对数据的质量有一个总体的认识和把握,特别是对数据的可靠性和准确性应有一个定量的说法。(2)对原始数据中的异常点进行识别和纠正。所谓异常点或称奇异点(outlier),是对应于误差较大的观察数据,在这里主要是指社会经济和科技统计中的失实数据。由于种种技术上或非技术上原因形成的基层调查单位上报数据中的异常点,必然…  相似文献   

11.
Abstract

Robust parameter design (RPD) is an effective tool, which involves experimental design and strategic modeling to determine the optimal operating conditions of a system. The usual assumptions of RPD are that normally distributed experimental data and no contamination due to outliers. And generally the parameter uncertainties in response models are neglected. However, using normal theory modeling methods for a skewed data and ignoring parameter uncertainties can create a chain of degradation in optimization and production phases such that misleading fit, poor estimated optimal operating conditions, and poor quality products. This article presents a new approach based on confidence interval (CI) response modeling for the process mean. The proposed interval robust design makes the system median unbiased for the mean and uses midpoint of the interval as a measure of location performance response. As an alternative robust estimator for the process variance response modeling, using biweight midvariance is proposed which is both resistant and robust of efficiency where normality is not met. The results further show that the proposed interval robust design gives a robust solution to the skewed structure of the data and to contaminated data. The procedure and its advantages are illustrated using two experimental design studies.  相似文献   

12.
Asymmetric models have been extensively studied in recent years, in situations where the normality assumption is not satisfied due to lack of symmetry of the data. Techniques for assessing the quality of fit and diagnostic analysis are important for model validation. This paper presents a study of the mean-shift method for detecting outliers in asymmetric normal regression models. Analytical solutions for the estimators of the parameters are obtained using the algorithm. Simulation studies and application to real data are presented, showing the efficiency of the method in detecting outliers.  相似文献   

13.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

14.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

15.
Some quality characteristics are well defined when treated as the response variables and their relationships are identified to some independent variables. This relationship is called a profile. The parametric models, such as linear models, may be used to model the profiles. However, due to the complexity of many processes in practical applications, it is inappropriate to model the process using parametric models. In these cases non parametric methods are used to model the processes. One of the most applicable non parametric methods used to model complicated profiles is the wavelet. Many authors considered the use of the wavelet transformation only for monitoring the processes in phase II. The problem of estimating the in-control profile in phase I using wavelet transformation is not deeply addressed. Usually classical estimators are used in phase I to estimate the in-control profiles, even when the wavelet transformation is used. These estimators are suitable if the data do not contain outliers. However, when the outliers exist, these estimators cannot estimate the in-control profile properly. In this research, a robust method of estimating the in-control profiles is proposed, which is insensitive to the presence of outliers and could be applied when the wavelet transformation is used. The proposed estimator is the combination of the robust clustering and the S-estimator. This estimator is compared with the classical estimator of the in-control profile in the presence of outliers. The results from a large simulation study show that using the proposed method, one can estimate the in-control profile precisely when the data are contaminated either locally or globally.  相似文献   

16.
We propose two preprocessing algorithms suitable for climate time series. The first algorithm detects outliers based on an autoregressive cost update mechanism. The second one is based on the wavelet transform, a method from pattern recognition. In order to benchmark the algorithms'' performance we compare them to existing methods based on a synthetic data set. Eventually, for exemplary purposes, the proposed methods are applied to a data set of high-frequent temperature measurements from Novi Sad, Serbia. The results show that both methods together form a powerful tool for signal preprocessing: In case of solitary outliers the autoregressive cost update mechanism prevails, whereas the wavelet-based mechanism is the method of choice in the presence of multiple consecutive outliers.  相似文献   

17.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

18.
In many areas of application mixed linear models serve as a popular tool for analyzing highly complex data sets. For inference about fixed effects and variance components, likelihood-based methods such as (restricted) maximum likelihood estimators, (RE)ML, are commonly pursued. However, it is well-known that these fully efficient estimators are extremely sensitive to small deviations from hypothesized normality of random components as well as to other violations of distributional assumptions. In this article, we propose a new class of robust-efficient estimators for inference in mixed linear models. The new three-step estimation procedure provides truncated generalized least squares and variance components' estimators with hard-rejection weights adaptively computed from the data. More specifically, our data re-weighting mechanism first detects and removes within-subject outliers, then identifies and discards between-subject outliers, and finally it employs maximum likelihood procedures on the “clean” data. Theoretical efficiency and robustness properties of this approach are established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号