首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

2.
The paper describes Bayesian analysis for agricultural field experiments, a topic that has received very little previous attention, despite a vast frequentist literature. Adoption of the Bayesian paradigm simplifies the interpretation of the results, especially in ranking and selection. Also, complex formulations can be analysed with comparative ease, by using Markov chain Monte Carlo methods. A key ingredient in the approach is the need for spatial representations of the unobserved fertility patterns. This is discussed in detail. Problems caused by outliers and by jumps in fertility are tackled via hierarchical t formulations that may find use in other contexts. The paper includes three analyses of variety trials for yield and one example involving binary data; none is entirely straightforward. Some comparisons with frequentist analyses are made.  相似文献   

3.
This paper documents situations where the variance inflation model for outliers has undesirable properties. The model is commonly used to accommodate outliers in a Bayesian analysis of regression and time series models. The alternative approach provided here does not suffer from these undesirable properties but gives inferences similar to those of the variance inflation model when this is appropriate. It can be used with regression, time series, and regression with correlated errors in a unified way, and adheres to the scientific principle that inference should be based on the data after obvious outliers have been discarded. Only one parameter is required for outliers; it is interpretable as the a priori willingness to remove observations from the analysis.  相似文献   

4.
Some quality characteristics are well defined when treated as the response variables and their relationships are identified to some independent variables. This relationship is called a profile. The parametric models, such as linear models, may be used to model the profiles. However, due to the complexity of many processes in practical applications, it is inappropriate to model the process using parametric models. In these cases non parametric methods are used to model the processes. One of the most applicable non parametric methods used to model complicated profiles is the wavelet. Many authors considered the use of the wavelet transformation only for monitoring the processes in phase II. The problem of estimating the in-control profile in phase I using wavelet transformation is not deeply addressed. Usually classical estimators are used in phase I to estimate the in-control profiles, even when the wavelet transformation is used. These estimators are suitable if the data do not contain outliers. However, when the outliers exist, these estimators cannot estimate the in-control profile properly. In this research, a robust method of estimating the in-control profiles is proposed, which is insensitive to the presence of outliers and could be applied when the wavelet transformation is used. The proposed estimator is the combination of the robust clustering and the S-estimator. This estimator is compared with the classical estimator of the in-control profile in the presence of outliers. The results from a large simulation study show that using the proposed method, one can estimate the in-control profile precisely when the data are contaminated either locally or globally.  相似文献   

5.
王斌会 《统计研究》2007,24(8):72-76
传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。  相似文献   

6.
Outliers can occur as readily in samples from the finite populations (e.g. in sample surveys) as in samples from infinite populations. However, in the vast literature on outliers there is almost no mention of outlier tests for data from sample surveys. We examine the behaviour of some standard outlier test statistics for infinite populations when these are applied to finite populations, examining their properties by extensive simulation studies. Some anomalous results are obtained Nsuggesting a fundamental difficulty in testing outliers for the finite population case.  相似文献   

7.
Parameter estimation is the first step in constructing control charts. One of these parameters is the process mean. The classical estimators of the process mean are sensitive to the presence of outlying data and subgroups which contaminate the whole data. In existing robust estimators for the process mean, the effects of the presence of the individual outliers are being considered, while, in this paper, a robust estimator is being proposed to reduce the effect of outlying subgroups as well as the individual outliers within a subgroup. The proposed estimator was compared with some classical and robust estimators of the process mean. Although, its relative efficiency is fourth among the estimators tested, its robustness and efficiency are large when the outlying subgroups are present. Evaluation of the results indicated that the proposed estimator is less sensitive to the presence of outliers and the process mean performs well when there are no individual outliers or outlying subgroups.  相似文献   

8.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

9.
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion (AIC). In this paper, explicit expressions are obtained for such criteria when S- and MM-estimators are used. The performance of these criteria is compared with the existing AIC based on M-estimators and with the classical non-robust AIC. In a simulation study and in data examples, we observe that the proposed AIC with S and MM-estimators selects more appropriate models in case outliers are present.  相似文献   

10.
In this study we investigate the problem of estimation and testing of hypotheses in multivariate linear regression models when the errors involved are assumed to be non-normally distributed. We consider the class of heavy-tailed distributions for this purpose. Although our method is applicable for any distribution in this class, we take the multivariate t-distribution for illustration. This distribution has applications in many fields of applied research such as Economics, Business, and Finance. For estimation purpose, we use the modified maximum likelihood method in order to get the so-called modified maximum likelihood estimates that are obtained in a closed form. We show that these estimates are substantially more efficient than least-square estimates. They are also found to be robust to reasonable deviations from the assumed distribution and also many data anomalies such as the presence of outliers in the sample, etc. We further provide test statistics for testing the relevant hypothesis regarding the regression coefficients.  相似文献   

11.
ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions.  相似文献   

12.
Recently, several new robust multivariate estimators of location and scatter have been proposed that provide new and improved methods for detecting multivariate outliers. But for small sample sizes, there are no results on how these new multivariate outlier detection techniques compare in terms of p n , their outside rate per observation (the expected proportion of points declared outliers) under normality. And there are no results comparing their ability to detect truly unusual points based on the model that generated the data. Moreover, there are no results comparing these methods to two fairly new techniques that do not rely on some robust covariance matrix. It is found that for an approach based on the orthogonal Gnanadesikan–Kettenring estimator, p n can be very unsatisfactory with small sample sizes, but a simple modification gives much more satisfactory results. Similar problems were found when using the median ball algorithm, but a modification proved to be unsatisfactory. The translated-biweights (TBS) estimator generally performs well with a sample size of n≥20 and when dealing with p-variate data where p≤5. But with p=8 it can be unsatisfactory, even with n=200. A projection method as well the minimum generalized variance method generally perform best, but with p≤5 conditions where the TBS method is preferable are described. In terms of detecting truly unusual points, the methods can differ substantially depending on where the outliers happen to be, the number of outliers present, and the correlations among the variables.  相似文献   

13.
Extensions of recent results for detection of mean slippage type outliers from i.i.d. multivariate normal and elliptically symmetric distributions are made to symmetric case, that is, when the observations are equicorrelated. The main tool used is Wijsman's (1967) representation theorem. The results obtained can be viewed as a robustness property of the use of Mardia's multivariate kurtosis as a locally optimal test statistic to detect outliers against equicorrelated distributions.  相似文献   

14.
In a recent paper, Hampel (1985) studied the properties of rejection-plus-mean procedures as estimators of a location parameter. He reported that these procedures have low breakdown and high variance. In this article it is pointed out that these results are due to the outliers being rejected in a forwards-stepping manner, and when a more appropriate backwards-stepping approach is used, rejection-plus-mean procedures lead to estimators with high breakdown and high variance. In this article it is pointed out that these results are due to the outliers being rejected in a forwards-stepping manner, and when a more appropriate backwards-stepping approach is used, rejection-plus-mean procedures lead to estimator with high breakdown and redescending theoretical influence function.  相似文献   

15.
The power of some rank tests, used for testing the hypothesis of shift, is found when the underlying distributions contain outliers. The outliers are assumed to occur as the result of mixing two normal distributions with common variance. A small sample case shows how the scores for the rank tests are found and the exact power is computed for each of these rank tests. A Monte Carlo study provides an estimate of the power of the usual two sample t-test.  相似文献   

16.
In order to describe or generate so-called outliers in univariate statistical data, contamination models are often used. These models assume that k out of n independent random variables are shifted or multiplicated by some constant, whereas the other observations still come i.i.d. from some common target distribution. Of course, these contaminants do not necessarily stick out as the extremes in the sample. Moreover, it is the amount and magnitude of ‘contamination” which determines the number of obvious outliers. Using the concept of Davies and Gather (1993) to formalize the outlier notion we quantify the amount of contamination needed to produce a prespecified expected number of ‘genuine’ outliers. In particular, we demonstrate that for sample of moderate size from a normal target distribution a rather large shift of the contaminants is necessary to yield a certain expected number of outliers. Such an insight is of interest when designing simulation studies where outliers shoulod occur as well as in theoretical investigations on outliers.  相似文献   

17.
In this article, robust estimation and prediction in multivariate autoregressive models with exogenous variables (VARX) are considered. The conditional least squares (CLS) estimators are known to be non-robust when outliers occur. To obtain robust estimators, the method introduced in Duchesne [2005. Robust and powerful serial correlation tests with new robust estimates in ARX models. J. Time Ser. Anal. 26, 49–81] and Bou Hamad and Duchesne [2005. On robust diagnostics at individual lags using RA-ARX estimators. In: Duchesne, P., Rémillard, B. (Eds.), Statistical Modeling and Analysis for Complex Data Problems. Springer, New York] is generalized for VARX models. The asymptotic distribution of the new estimators is studied and from this is obtained in particular the asymptotic covariance matrix of the robust estimators. Classical conditional prediction intervals normally rely on estimators such as the usual non-robust CLS estimators. In the presence of outliers, such as additive outliers, these classical predictions can be severely biased. More generally, the occurrence of outliers may invalidate the usual conditional prediction intervals. Consequently, the new robust methodology is used to develop robust conditional prediction intervals which take into account parameter estimation uncertainty. In a simulation study, we investigate the finite sample properties of the robust prediction intervals under several scenarios for the occurrence of the outliers, and the new intervals are compared to non-robust intervals based on classical CLS estimators.  相似文献   

18.
Abstract

There are three main problems in the existing procedures for detecting outliers in ARIMA models. The first one is the biased estimation of the initial parameter values that may strongly affect the power to detect outliers. The second problem is the confusion between level shifts and innovative outliers when the series has a level shift. The third problem is masking. We propose a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem. A Monte Carlo study and one example of the performance of the proposed procedure are presented.  相似文献   

19.
The Burr XII distribution offers a flexible alternative to the distributions that play important role for modelling data in reliability, risk and process capability. However, estimating the shape parameters of the Burr XII distribution is a challenging problem. The classical estimation methods such as maximum likelihood and least squares are often used to estimate the parameters of the Burr XII distribution, but these methods are very sensitive to the outliers in the data. Thus, a robust estimation method alternative to the classical methods is needed to find robust estimators that are less sensitive to the outliers in the data. The purpose of this paper is to use the optimal B-robust estimation method [Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust statistics: the approach based on influence functions. New York: Wiley; 1986] to obtain robust estimators for the shape parameters of the Burr XII distribution. The simulation results show that the optimal B-robust estimators generally outperform the classical estimators in terms of the bias and root mean square errors when there are outliers in data.  相似文献   

20.
Repeating measurements of efficacy variables in clinical trials may be desirable when the measurement may be affected by ambient conditions. When such measurements are repeated at baseline and at the end of therapy, statistical questions relate to: (1) the best summary measurement to use for a subject when there is a possibility that some observations are contaminated and have increased variances; and (2) the effect of screening procedures which exclude outliers based on within- and between-subject contamination tests. We study these issues in two stages, each using a different set of models. The first stage deals only with the choice of the summary measure. The simulation results show that in some cases of contamination, the power achieved by the tests based on the median exceeds that achieved by the tests based on the mean of the replicates. However, even when we use the median, there are cases when contamination leads to a considerable loss in power. The combined issue of the best summary measurement and the effect of screening is studied in the second stage. The tests use either the observed data or the data after screening for outliers. The simulation results demonstrate that the power depends on the screening procedure as well as on the test statistic used in the study. We found that for the extent and magnitude of contamination considered, within-subject screening has a minimal effect on the power of the tests when there are at least three replicates; as a result, we found no advantage in the use of screening procedures for within-subject contamination. On the other hand, the use of a between-subject screening for outliers increases the power of the test procedures. However, even with the use of screening procedures, heterogeneity of variances can greatly reduce the power of the study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号