首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The boxplot is an effective data-visualization tool useful in diverse applications and disciplines. Although more sophisticated graphical methods exist, the boxplot remains relevant due to its simplicity, interpretability, and usefulness, even in the age of big data. This article highlights the origins and developments of the boxplot that is now widely viewed as an industry standard as well as its inherent limitations when dealing with data from skewed distributions, particularly when detecting outliers. The proposed Ratio-Skewed boxplot is shown to be practical and suitable for outlier labeling across several parametric distributions.  相似文献   

2.
In analyzing data from unreplicated factorial designs, the half-normal probability plot is commonly used to screen for the ‘vital few’ effects. Recently, many formal methods have been proposed to overcome the subjectivity of this plot. Lawson (1998) (hereafter denoted as LGB) suggested a hybrid method based on the half-normal probability plot, which is a blend of Lenth (1989) and Loh (1992) method. The method consists of fitting a simple least squares line to the inliers, which are determined by the Lenth method. The effects exceeding the prediction limits based on the fitted line are candidates for the vital few effects. To improve the accuracy of partitioning the effects into inliers and outliers, we propose a modified LGB method (hereafter denoted as the Mod_LGB method), in which more outliers can be classified by using both the Carling’s modification of the box plot (Carling, 2000) and Lenth method. If no outlier exists or there is a wide range in the inliers as determined by the Lenth method, more outliers can be found by the Carling method. A simulation study is conducted in unreplicated 24 designs with the number of active effects ranging from 1 to 6 to compare the efficiency of the Lenth method, original LGB methods, and the proposed modified version of the LGB method.  相似文献   

3.
A simple univariate outlier identification procedure is presented for the detection of multiple outliers in large and moderate sized data sets. This procedure is a modification of the well-known boxplot outlier-labeling rule. Critical values are easy to obtain for the large sample case for a variety of useful distributions, including the normal, t, gamma, and Weibull. Simple adjustment formulas and graphs are provided for handling smaller samples. Basic probability properties are obtained mathematically and through simulation. Two data sets illustrate the procedure's application as a simple and effective screening tool for both moderate and large-sized univariate samples.  相似文献   

4.
The presence of extreme outliers in the upper tail data of income distribution affects the Pareto tail modeling. A simulation study is carried out to compare the performance of three types of boxplot in the detection of extreme outliers for Pareto data, including standard boxplot, adjusted boxplot and generalized boxplot. It is found that the generalized boxplot is the best method for determining extreme outliers for Pareto distributed data. For the application, the generalized boxplot is utilized for determining the exreme outliers in the upper tail of Malaysian income distribution. In addition, for this data set, the confidence interval method is applied for examining the presence of dragon-kings, extreme outliers which are beyond the Pareto or power-laws distribution.  相似文献   

5.
Whether an extreme observation is an outlier or not depends strongly on the corresponding tail behavior of the underlying distribution. We develop an automatic, data-driven method rooted in the mathematical theory of extremes to identify observations that deviate from the intermediate and central characteristics. The proposed algorithm is an extension of a method previously proposed in the literature for the specific case of heavy tailed Pareto-type distributions to all max-domains of attraction. We propose some applications such as a tail-adjusted boxplot which yields a more accurate representation of possible outliers, and the identification of outliers in a multivariate context through an analysis of associated random variables such as local outlier factors. Several examples and simulation results illustrate the finite sample behavior of the algorithm and its applications.  相似文献   

6.
This study investigates the influences of additive outliers on financial durations. An outlier test statistic and an outlier detection procedure are proposed to detect and estimate outlier effects for the logarithmic Autoregressive Conditional Duration (Log-ACD) model. The proposed test statistic has an exact sampling distribution and performs very well, in terms of size and power, in a series of Monte Carlo simulations. Furthermore, the test statistic is robust to several alternative distribution assumptions. An empirical application shows that parameter estimates without considering outliers tend to be biased.  相似文献   

7.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

8.
Boxplots are among the most widely used exploratory data analysis (EDA) tools in statistical practice. Typical applications of boxplots include eliciting information about the underlying distribution (shape, location, etc.) as well as identifying possible outliers. This article focuses on a modification using a type of lower and upper fences similar in concept to those used in a traditional boxplot; however, instead of constructing the upper and lower fences using the upper and lower quartiles, respectively, and a multiple of the interquartile range (IQR), multiples of the upper and the lower semi-interquartile ranges (SIQR), respectively, measured from the sample median, are used. Any observation beyond the proposed fences is labeled a potential outlier. An exact expression for the probability that at least one sample observation is wrongly classified as an outlier, the so-called “some-outside rate per sample” (Hoaglin et al. (1986)), is derived for the family of location-scale distributions and is used in the determination of the fence constants. Tables for the fence constants are provided for a number of well-known location-scale distributions along with some illustrations with data; the performance of the outlier detection rule is explored in a simulation study.  相似文献   

9.
A review of several statistical methods that are currently in use for outlier identification is presented, and their performances are compared theoretically for typical statistical distributions of experimental data, considering values derived from the distribution of extreme order statistics as reference terms. A simple modification of a popular, broadly used method based upon box-plot is introduced, in order to overcome a major limitation concerning sample size. Examples are presented concerning exploitation of methods considered on two data sets: a historical one concerning evaluation of an astronomical constant performed by a number of leading observatories and a substantial database pertaining to an ongoing investigation on absolute measurement of gravity acceleration, exhibiting peculiar aspects concerning outliers. Some problems related to outlier treatment are examined, and the requirement of both statistical analysis and expert opinion for proper outlier management is underlined.  相似文献   

10.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

11.
The author presents a robust F-test for comparing nested linear models. It is suggested that the approach will be attractive to practitioners because it is based on the familiar F-statistic and corresponds to the common practice of reporting F-statistics after removing obvious outliers. It is calibrated in terms of a real parameter that can be directly interpreted as the willingness of the data analyst to remove observations, and the sensitivity of the F-statistic to this parameter is easily examined. The procedure is evaluated with a simulation study where a scale mixture distribution is used to generate outliers. The procedure is also applied to some data where the occurrence of an outlier is confounded with the significance of a regression term. This provides a comparison of two competing models for the data: one removing an outlier and the other including an additional regression term instead.  相似文献   

12.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

13.
In this paper, we present a test procedure to detect outliers in the one-parameter exponential distribution based on prediction. The distribution of the test statistic is obtained. The proposed test can be used to detect more than one outlier and the required percentage points can be easily determined. Furthermore, the test provides a simple procedure to detect whether a given set of data is free from outliers or spurious observations.  相似文献   

14.
Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers.  相似文献   

15.
The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample  相似文献   

16.
This work studies outlier detection and robust estimation with data that are naturally distributed into groups and which follow approximately a linear regression model with fixed group effects. For this, several methods are considered. First, the robust fitting method of Peña and Yohai [A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc. 1999;94:434–445], called principal sensitivity components (PSC) method, is adapted to the grouped data structure and the mentioned model. The robust methods RDL1 of Hubert and Rousseeuw [Robust regression with both continuous and binary regressors. J Stat Plan Inference. 1997;57:153–163] and M-S of Maronna and Yohai [Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 2000;89:197–214] are also considered. These three methods are compared in terms of their effectiveness in outlier detection and their robustness through simulations, considering several contamination scenarios and growing contamination levels. Results indicate that the adapted PSC procedure is able to detect a high percentage of true outliers and a small number of false outliers. It is appropriate when the contamination is in the error term or in the covariates, detecting also possibly masked high leverage points. Moreover, in simulations the final robust regression estimator preserved good efficiency under Normality while keeping good robustness properties.  相似文献   

17.
Functional boxplot is an attractive technique to visualize data that come from functions. We propose an alternative to the functional boxplot based on depth measures. Our proposal generalizes the usual construction of the box-plot in one dimension related to the down-upward orderings of the data by considering two intuitive pre-orders in the functional context. These orderings are based on the epigraphs and hypographs of the data that allow a new definition of functional quartiles which is more robust to shape outliers. Simulated and real examples show that this proposal provides a convenient visualization technique with a great potential for analyzing functional data and illustrate its usefulness to detect outliers that other procedures do not detect.  相似文献   

18.
ABSTRACT

Nonparametric charts are useful in statistical process control when there is a lack of or limited knowledge about the underlying process distribution. Most existing approaches in the literature of Phase I monitoring assume that outliers have the same distributions as the in-control sample but only differ in location or scale parameters, they may not be effective with distributional changes. This article develops a new procedure based on the integration of the classical Anderson–Darling goodness-of-fit test and the stepwise isolation method. Our proposed procedure is efficient in detecting potential shifts in location, scale, or shape, and thus it offers robust protection against variation in various underlying distributions. The finite sample performance of our method is evaluated through simulations and is compared with that of available outlier detection methods for Phase I monitoring.  相似文献   

19.
The problem of outliers in statistical data has attracted many researchers for a long time. Consequently, numerous outlier detection methods have been proposed in the statistical literature. However, no consensus has emerged as to which method is uniformly better than the others or which one is recommended for use in practical situations. In this article, we perform an extensive comparative Monte Carlo simulation study to assess the performance of the multiple outlier detection methods that are either recently proposed or frequently cited in the outlier detection literature. Our simulation experiments include a wide variety of realistic and challenging regression scenarios. We give recommendations on which method is superior to others under what conditions.  相似文献   

20.
As the Watson distribution is frequently used for modeling axial data, it is important to investigate the existence of possible outliers in samples from this distribution. Then, we develop for the bipolar Watson distribution defined on the hypersphere, some tests of discordancy of an outlier or several outliers en bloc based on the likelihood ratio, supposing an alternative model of contamination of slippage type. We evaluate the performance of these tests of discordancy of an outlier and we also compare some tests of discordancy of an outlier available for this distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号