首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper studies outlier detection and accommodation in general spatial models including spatial autoregressive models and spatial error model as special cases. Using mean-shift and variance-weight models respectively, test statistics for multiple outliers are derived and the detecting procedures are proposed. In addition, several key diagnostic measures such as standardized residuals and leverage measure are defined in general spatial models. Outlier modified models are proposed to accommodate outliers in the data set. The performance of test statistics, including size and power, are examined via simulation studies. Three real examples are analyzed and the results show that the proposed methodology is useful for identifying and accommodating outliers in general spatial models.  相似文献   

2.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

3.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

4.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

5.
Several authors have taken the worst case breakdown measures in analyzing the robustness of a test. In general, these kinds of measures give only a rough picture of breakdown robustness of a test. To overcome this limitation, a new kind of breakdown measure of a test is defined as the smallest proportion of arbitrary outliers in the sample that can distort the test decision. It is called as the sample breakdown point of a test in this paper. A distinct advantage of this new measure is that it is directly concerned with the test decision based on the present sample and with the critical region of the test. The sample breakdown points of several commonly used tests of one-sided or two-sided hypotheses are calculated and their asymptotic properties are also established. By Monte Carlo simulations and asymptotic analysis, we show that the acceptance breakdown of the t-test and the Hotelling T2-test is slightly better than that of the sample mean test. Finally, we prove that, for a one-sided hypothesis testing of location, the sign test has the maximum sample breakdown points asymptotically within a class of M-tests and score-tests.  相似文献   

6.
Parameter estimation is the first step in constructing control charts. One of these parameters is the process mean. The classical estimators of the process mean are sensitive to the presence of outlying data and subgroups which contaminate the whole data. In existing robust estimators for the process mean, the effects of the presence of the individual outliers are being considered, while, in this paper, a robust estimator is being proposed to reduce the effect of outlying subgroups as well as the individual outliers within a subgroup. The proposed estimator was compared with some classical and robust estimators of the process mean. Although, its relative efficiency is fourth among the estimators tested, its robustness and efficiency are large when the outlying subgroups are present. Evaluation of the results indicated that the proposed estimator is less sensitive to the presence of outliers and the process mean performs well when there are no individual outliers or outlying subgroups.  相似文献   

7.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

8.
In this article, we present an M-estimator to estimate the parameters of the extended three-parameter Burr Type III distribution for complete data with outliers. The confidence intervals for all parameters can be obtained by the M-estimator's normal approximation. The simulation results show that the M-estimator generally outperforms the maximum likelihood and least squares methods in terms of bias and root mean square errors. We also investigate the M-estimator's impact on different quantiles and the mean for the Weibull and normal distributions with outliers. Two numerical examples are used to demonstrate the performance of our proposed method.  相似文献   

9.
The presence of contamination often called outlier is a very common attribute in data. Among other causes, outliers in a homoscedastic model make the model heteroscedastic. Moreover, outliers distort diagnostic tools for heteroscedasticity such that it may not be correctly identified. In this article, we show how outliers affect heteroscedasticity diagnostics. We then proposed a robust procedure for detecting heteroscedasticity in the presence of outliers by robustifying the non-robust component of the Goldfeld–Quandt (GQ) test. The performance of the proposed procedure is examined using simulation experiment and real data sets. The proposed procedure offers great improvement where the conventional GQ and other procedures fail.  相似文献   

10.
We propose two preprocessing algorithms suitable for climate time series. The first algorithm detects outliers based on an autoregressive cost update mechanism. The second one is based on the wavelet transform, a method from pattern recognition. In order to benchmark the algorithms'' performance we compare them to existing methods based on a synthetic data set. Eventually, for exemplary purposes, the proposed methods are applied to a data set of high-frequent temperature measurements from Novi Sad, Serbia. The results show that both methods together form a powerful tool for signal preprocessing: In case of solitary outliers the autoregressive cost update mechanism prevails, whereas the wavelet-based mechanism is the method of choice in the presence of multiple consecutive outliers.  相似文献   

11.
High leverage points can induce or disrupt multicollinearity patterns in data. Observations responsible for this problem are generally known as collinearity-influential observations. A significant amount of published work on the identification of collinearity-influential observations exists; however, we show in this article that all commonly used detection techniques display greatly reduced sensitivity in the presence of multiple high leverage collinearity-influential observations. We propose a new measure based on a diagnostic robust group deletion approach. Some practical cutoff points for existing and developed diagnostics measures are also introduced. Numerical examples and simulation results show that the proposed measure provides significant improvement over the existing measures.  相似文献   

12.
In the past decade, different robust estimators have been proposed by several researchers to improve the ability to detect non-random patterns such as trend, process mean shift, and outliers in multivariate control charts. However, the use of the sample mean vector and the mean square successive difference matrix in the T 2 control chart is sensitive in detecting process mean shift or trend but less sensitive in detecting outliers. On the other hand, the minimum volume ellipsoid (MVE) estimators in the T 2 control chart are sensitive in detecting multiple outliers but less sensitive in detecting trend or process mean shift. Therefore, new robust estimators using both merits of the mean square successive difference matrix and the MVE estimators are developed to modify Hotelling's T 2 control chart. To compare the detection performance among various control charts, a simulation approach for establishing control limits and calculating signal probabilities is provided as well. Our simulation results show that a multivariate control chart using the new robust estimators can achieve a well-balanced sensitivity in detecting the above-mentioned non-random patterns. Finally, three numerical examples further demonstrate the usefulness of our new robust estimators.  相似文献   

13.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

14.
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion (AIC). In this paper, explicit expressions are obtained for such criteria when S- and MM-estimators are used. The performance of these criteria is compared with the existing AIC based on M-estimators and with the classical non-robust AIC. In a simulation study and in data examples, we observe that the proposed AIC with S and MM-estimators selects more appropriate models in case outliers are present.  相似文献   

15.
An adaptive Kalman filter is proposed to estimate the states of a system where the system noise is assumed to be a multivariate generalized Laplace random vector. In the presence of outliers in the system noise, it is shown that improved state estimates can be obtained by using an adaptive factor to estimate the dispersion matrix of the system noise term. For the implementation of the filter, an algorithm which includes both single and multiple adaptive factors is proposed. A Monte-Carlo investigation is also carried out to access the performance of the proposed filters in comparison with other robust filters. The results show that, in the sense of minimum mean squared state error, the proposed filter is superior to other filters when the magnitude of a system change is moderate or large.  相似文献   

16.
In this paper, we suggest a least squares procedure for the determination of the number of upper outliers in an exponential sample by minimizing sample mean squared error. Moreover, the method can reduce the masking or “swamping” effects. In addition, we have also found that the least squares procedure is easy and simple to compute than test test procedure T k suggested by Zhang (1998) for determining the number of upper outliers, since Zhang (1998) need to use the complicated null distribution of T k . Moreover, we give three practical examples and a simulated example to illustrate the procedures. Further, simulation studies are given to show the advantages of the proposed method. Finally, the proposed least squares procedure can also determine the number of upper outliers in other continuous univariate distributions (for example, Pareto, Gumbel, Weibull, etc.). Received: May 10, 1999; revised version: June 5, 2000  相似文献   

17.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

18.
This paper studies methods for simple estimation of the exponential mean parameter in small samples in the presence of outliers. Existing estimation methods are discussed. Adaptation of these methods to allow for Type I censoring is investigated. New robust procedures are proposed. A series of simulation experiments Indicate trimming provides significant protection against outliers while the premium is usually small when trimming uncontarninated samples. A linearly weighted mean is recommended for uncontarninated samples, both censored and complete. In larger samples, (n - 20), the proposed Huber-type estimator performs quite well in all situations of censoring and contarnination  相似文献   

19.
In this paper, we propose a method for outlier detection and removal in electromyographic gait-related patterns (EMG-GRPs). The goal was to detect and remove EMG-GRPs that reduce the quality of gait data while preserving natural biological variations in EMG-GRPs. The proposed procedure consists of general statistical tests and is simple to use. The Friedman test with multiple comparisons was used to find particular EMG-GRPs that are extremely different from others. Next, outlying observations were calculated for each suspected stride waveform by applying the generalized extreme studentized deviate test. To complete the analysis, we applied different outlier criteria. The results suggest that an EMG-GRP is an outlier if it differs from at least 50% of the other stride waveforms and contains at least 20% of the outlying observations. The EMG signal remains a realistic representation of muscle activity and demonstrates step-by-step variability once the outliers, as defined here, are removed.  相似文献   

20.

The additive AR-2D model has been successfully related to the modeling of satelital images both optic and of radar of synthetic opening. Having in mind the errors that are produced in the process of captation and quantification of the image, an interesting subject, is the robust estimation of the parameters in this model. Besides the robust methods in image models are also applied in some important image processing situations such as segmentation by texture and image restoration in the presence of outliers. This paper is concerned with the development and performance of the robust RA estimator proposed by Ojeda (1998) for the estimation of parameters in contaminated AR-2D models. Here, we implement this estimator and we show by simulation study that it has a better performance than the classic least square estimator and the robust M and GM estimators in an additive outlier contaminated image model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号