首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Sliced Inverse Regression (SIR) is a promising technique for the purpose of dimension reduction. Several properties of this method have been examined already, but little attention has been paid to robustness aspects. In this article, we focus on the sensitivity of SIR to outliers and show in what sense and how severely SIR can be influenced by outliers in the data.  相似文献   

2.
王斌会 《统计研究》2007,24(8):72-76
传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。  相似文献   

3.
In this study, we want to detect the outliers from large medical records and exclude them because outliers will influence the accuracy and efficiency of the analysis. We compare the traditional method and the method from frontier model. In this study, we want to detect the outliers of cost and length of stay for pneumonia patients from HCUP 2002 and 2003.  相似文献   

4.
由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

5.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

6.
Robust statistics have slowly become familiar to all practitioners. Books entirely devoted to the subject (e.g. [R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics: Theory and Methods. John Wiley &; Sons, New York, NY, USA, 2006; P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, John Wiley &; Sons, New York, NY, USA, 1987], …) are without any doubt responsible for the increased practice of robust statistics in all fields of applications. Even classical books often have at least one chapter (or parts of chapters) which develops robust methodology. The improvement of computing power has also contributed to the development of a wider and wider range of available robust procedures. However, this success story is now menacing to get backwards: non-specialists interested in the application of robust methodology are faced with a large set of (assumed equivalent) methods and with over-sophistication of some of them. Which method should one use? How should the (numerous) parameters be optimally tuned? These questions are not so easy to answer for non-specialists! One could then argue that default procedures are available in most statistical software (Splus, R, SAS, Matlab, …). However, using as illustration the detection of outliers in multivariate data, it is shown that, on one hand, it is not obvious that one would feel confident with the output of default procedures, and that, on the other hand, trying to understand thoroughly the tuning parameters involved in the procedures might require some extensive research. This is not conceivable when trying to compete with the classical methodology which (while clearly unreliable) is so straightforward. The aim of the paper is to help the practitioners willing to detect in a reliable way outliers in a multivariate data set. The chosen methodology is the Minimum Covariance Determinant estimator being widely available and intuitively appealing.  相似文献   

7.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

8.
Asymmetric models have been extensively studied in recent years, in situations where the normality assumption is not satisfied due to lack of symmetry of the data. Techniques for assessing the quality of fit and diagnostic analysis are important for model validation. This paper presents a study of the mean-shift method for detecting outliers in asymmetric normal regression models. Analytical solutions for the estimators of the parameters are obtained using the algorithm. Simulation studies and application to real data are presented, showing the efficiency of the method in detecting outliers.  相似文献   

9.
ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions.  相似文献   

10.
Some statistics practitioners often ignore the underlying assumptions when analyzing a real data and employ the Nonlinear Least Squares (NLLS) method to estimate the parameters of a nonlinear model. In order to make reliable inferences about the parameters of a model, require that the underlying assumptions, especially the assumption that the errors are independent, are satisfied. However, in a real situation, we may encounter dependent error terms which prone to produce autocorrelated errors. A two-stage estimator (CTS) has been developed to remedy this problem. Nevertheless, it is now evident that the presence of outliers have an unduly effect on the least squares estimates. We expect that the CTS is also easily affected by outliers since it is based on the least squares estimator, which is not robust. In this article, we propose a Robust Two-Stage (RTS) procedure for the estimation of the nonlinear regression parameters in the situation where autocorrelated errors come together with the existence of outliers. The numerical example and simulation study signify that the RTS is more efficient than the NLLS and the CTS methods.  相似文献   

11.
We propose a method that integrates bootstrap into the forward search algorithm in the construction of robust confidence intervals for elements of the eigenvectors of the correlation matrix in the presence of outliers. Coverage probability of the bootstrap simultaneous confidence intervals was compared to the coverage probabilities of regular asymptotic confidence region and asymptotic confidence region based on the minimum covariance determinant (MCD) approach through a simulation study. The method produced more stable coverage probabilities for datasets with or without outliers and across several sample sizes compared to approaches based on asymptotic confidence regions.  相似文献   

12.
In this article, we present an M-estimator to estimate the parameters of the extended three-parameter Burr Type III distribution for complete data with outliers. The confidence intervals for all parameters can be obtained by the M-estimator's normal approximation. The simulation results show that the M-estimator generally outperforms the maximum likelihood and least squares methods in terms of bias and root mean square errors. We also investigate the M-estimator's impact on different quantiles and the mean for the Weibull and normal distributions with outliers. Two numerical examples are used to demonstrate the performance of our proposed method.  相似文献   

13.
Outliers are to be found among the extremes of a data set. Extremes are examples of order statistics. It is thus relevant to ask to what extent the statistical methods (and probabilistic properties) of outliers and of order statistics coincide and depend on each other. Whilst clear overlap is identifiable, aims and procedures are often quite distinct and each topic plays its own important role in the panoply of statistical principles and methodology.  相似文献   

14.
基于稳健主成分回归的统计数据可靠性评估方法   总被引:1,自引:0,他引:1       下载免费PDF全文
 稳健主成分回归(RPCR)是稳健主成分分析和稳健回归分析结合使用的一种方法,本文首次运用稳健的RPCR及异常值诊断方法,对2008年我国地区经济增长横截面数据可靠性做了评估。评估结果表明:稳健的RPCR方法能更好的克服异常值的影响,使估计结果更加可靠,并能有效的克服经典的主成分回归(CPCR)方法容易出现的多个异常点的掩盖现象;基本可以认为2008年地区经济增长与相关指标数据是匹配的,但部分地区的经济增长数据可能存在可靠性问题。  相似文献   

15.
基于稳健MM估计的统计数据质量评估方法   总被引:2,自引:1,他引:1       下载免费PDF全文
卢二坡  黄炳艺 《统计研究》2010,27(12):16-22
 政府统计数据质量是当前各界关注的热点问题,如何采用严谨的诊断方法,对我国统计数据进行科学的评估具有重要的现实意义。稳健回归方法可使求出的回归估计不受异常值的强烈影响,并且能更好的识别异常点。本文首次运用基于稳健MM估计的异常值诊断方法,在生产函数模型的框架下,分别使用两种不同的劳动投入数据,对改革以来我国GDP数据质量进行了评估。结果表明,基于稳健MM估计的异常值诊断方法可有效的解决传统方法容易出现的多个异常点的掩盖现象,改革以来我国的GDP数据是相对可靠的。  相似文献   

16.
ABSTRACT

Asymmetric models have been discussed quite extensively in recent years, in situations where the normality assumption is suspected due to lack of symmetry in the data. Techniques for assessing the quality of fit and diagnostic analysis are important for model validation. This paper presents a study of the mean-shift method for the detection of outliers in regression models under skew scale-mixtures of normal distributions. Analytical solutions for the estimators of the parameters are obtained through the use of Expectation–Maximization algorithm. The observed information matrix for the calculation of standard errors is obtained for each distribution. Simulation studies and an application to the analysis of a data have been carried out, showing the efficiency of the proposed method in detecting outliers.  相似文献   

17.
In this paper, we present a test procedure to detect outliers in the one-parameter exponential distribution based on prediction. The distribution of the test statistic is obtained. The proposed test can be used to detect more than one outlier and the required percentage points can be easily determined. Furthermore, the test provides a simple procedure to detect whether a given set of data is free from outliers or spurious observations.  相似文献   

18.
The ordinary least-square estimators for linear regression analysis with multicollinearity and outliers lead to unfavorable results. In this article, we propose a new robust modified ridge M-estimator (MRME) based on M-estimator (ME) to deal with the combined problem resulting from multicollinearity and outliers in the y-direction. MRME outperforms modified ridge estimator, robust ridge estimator and ME, according to mean squares error criterion. Furthermore, a numerical example and a Monte Carlo simulation experiment are given to illustrate some of the theoretical results.  相似文献   

19.
Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution.  相似文献   

20.
Mixed effects models or random effects models are popular for the analysis of longitudinal data. In practice, longitudinal data are often complex since there may be outliers in both the response and the covariates and there may be measurement errors. The likelihood method is a common approach for these problems but it can be computationally very intensive and sometimes may even be computationally infeasible. In this article, we consider approximate robust methods for nonlinear mixed effects models to simultaneously address outliers and measurement errors. The approximate methods are computationally very efficient. We show the consistency and asymptotic normality of the approximate estimates. The methods can also be extended to missing data problems. An example is used to illustrate the methods and a simulation is conducted to evaluate the methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号