首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Geometric mean (GM) is having growing and wider applications in statistical data analysis as a measure of central tendency. It is generally believed that GM is less sensitive to outliers than the arithmetic mean (AM) but we suspect likewise the AM the GM may also suffer a huge set back in the presence of outliers, especially when multiple outliers occur in a data. So far as we know, not much work has been done on the robustness issue of GM. In quest of a simple robust measure of central tendency, we propose the geometric median (GMed) in this paper. We show that the classical GM has only 0% breakdown point while it is 50% for the proposed GMed. Numerical examples also support our claim that the proposed GMed is unaffected in the presence of multiple outliers and can maintain the highest possible 50% breakdown. Later we develop a new method for the identification of multiple outliers based on this proposed GMed. A variety of numerical examples show that the proposed method can successfully identify all potential outliers while the traditional GM fails to do so.  相似文献   

2.
A nonparametric method for analyzing analysis of variance models is introduced which is highly resistant to outliers, computationally simple, and comprehensible to anyone with a rudimentary knowledge of classical analysis of variance. The methodology is based on Mood's median test and is highly useful as an exploratory technique.  相似文献   

3.
Summary.  We consider the problem of obtaining population-based inference in the presence of missing data and outliers in the context of estimating the prevalence of obesity and body mass index measures from the 'Healthy for life' study. Identifying multiple outliers in a multivariate setting is problematic because of problems such as masking, in which groups of outliers inflate the covariance matrix in a fashion that prevents their identification when included, and swamping, in which outliers skew covariances in a fashion that makes non-outlying observations appear to be outliers. We develop a latent class model that assumes that each observation belongs to one of K unobserved latent classes, with each latent class having a distinct covariance matrix. We consider the latent class covariance matrix with the largest determinant to form an 'outlier class'. By separating the covariance matrix for the outliers from the covariance matrices for the remainder of the data, we avoid the problems of masking and swamping. As did Ghosh-Dastidar and Schafer, we use a multiple-imputation approach, which allows us simultaneously to conduct inference after removing cases that appear to be outliers and to promulgate uncertainty in the outlier status through the model inference. We extend the work of Ghosh-Dastidar and Schafer by embedding the outlier class in a larger mixture model, consider penalized likelihood and posterior predictive distributions to assess model choice and model fit, and develop the model in a fashion to account for the complex sample design. We also consider the repeated sampling properties of the multiple imputation removal of outliers.  相似文献   

4.
In this article, the asymptotic distribution of the circular median is derived for symmetric distributions on the circle. Its asymptotic relative efficienty with respect to the mean direction and to an estimator proposed by Watson (1983) is then examined. Special attention is given to the cases where the underlying distribution is von Mises and contaminated von Mises. It is seen that the circular median can perform more efficiently than both estimators in presence of outliers.  相似文献   

5.
The INARCH(1) model for overdispersed time series of counts has a simple structure, a parsimonious parametrization, and a great potential for applications in practice. We analyze two approaches to approximate the marginal process distribution: a Markov chain approach and the Poisson–Charlier expansion. Then approaches for estimating the two model parameters are discussed. We derive explicit expressions for the asymptotic distribution of the maximum likelihood and conditional least squares estimators. They are used for constructing simultaneous confidence regions, the finite-sample performance of which is analyzed in a simulation study. A real-data example from economics illustrates the application of the INARCH(1) model.  相似文献   

6.
The coefficient of variation (CV) is commonly used to measure relative dispersion. However, since it is based on the sample mean and standard deviation, outliers can adversely affect it. Additionally, for skewed distributions the mean and standard deviation may be difficult to interpret and, consequently, that may also be the case for the CV. Here we investigate the extent to which quantile-based measures of relative dispersion can provide appropriate summary information as an alternative to the CV. In particular, we investigate two measures, the first being the interquartile range (in lieu of the standard deviation), divided by the median (in lieu of the mean), and the second being the median absolute deviation, divided by the median, as robust estimators of relative dispersion. In addition to comparing the influence functions of the competing estimators and their asymptotic biases and variances, we compare interval estimators using simulation studies to assess coverage.  相似文献   

7.
由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

8.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

9.
Abstract

There are three main problems in the existing procedures for detecting outliers in ARIMA models. The first one is the biased estimation of the initial parameter values that may strongly affect the power to detect outliers. The second problem is the confusion between level shifts and innovative outliers when the series has a level shift. The third problem is masking. We propose a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem. A Monte Carlo study and one example of the performance of the proposed procedure are presented.  相似文献   

10.
In this study, we want to detect the outliers from large medical records and exclude them because outliers will influence the accuracy and efficiency of the analysis. We compare the traditional method and the method from frontier model. In this study, we want to detect the outliers of cost and length of stay for pneumonia patients from HCUP 2002 and 2003.  相似文献   

11.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

12.
时间序列自回归AR模型在建模过程中易受离群值的影响,导致计算结果与实际不相符。针对这一现象,运用FQn统计量对传统自相关函数进行改进,构建出自回归AR模型的稳健估计算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析均表明:当时序数据中不存在离群值时,传统估计方法与稳健估计方法得到的结果基本保持一致;当数据中存在离群值时,运用传统估计方法得到的结果出现较大变化,而运用稳健估计方法得到的结果基本不变.这说明相对于传统估计方法,稳健估计方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

13.
The presence of contamination often called outlier is a very common attribute in data. Among other causes, outliers in a homoscedastic model make the model heteroscedastic. Moreover, outliers distort diagnostic tools for heteroscedasticity such that it may not be correctly identified. In this article, we show how outliers affect heteroscedasticity diagnostics. We then proposed a robust procedure for detecting heteroscedasticity in the presence of outliers by robustifying the non-robust component of the Goldfeld–Quandt (GQ) test. The performance of the proposed procedure is examined using simulation experiment and real data sets. The proposed procedure offers great improvement where the conventional GQ and other procedures fail.  相似文献   

14.
In this paper the integer-valued autoregressive model of order one, contaminated with additive outliers is studied in some detail. Moreover, parameter estimation is also addressed. Supposing that the timepoints of the outliers are known but their sizes are unknown, we prove that the conditional least squares (CLS) estimators of the offspring and innovation means are strongly consistent. In contrast, however, the CLS estimators of the outliers’ sizes are not strongly consistent, although they converge to a random limit with probability 1. We also prove that the joint CLS estimator of the offspring and innovation means is asymptotically normal. Conditionally on the values of the process at the timepoints neighboring to the outliers’ occurrences, the joint CLS estimator of the sizes of the outliers is also asymptotically normal.  相似文献   

15.
We consider integer-valued autoregressive models of order one contaminated with innovational outliers. Assuming that the time points of the outliers are known but their sizes are unknown, we prove that Conditional Least Squares (CLS) estimators of the offspring and innovation means are strongly consistent. In contrast, CLS estimators of the outliers' sizes are not strongly consistent. We also prove that the joint CLS estimator of the offspring and innovation means is asymptotically normal. Conditionally on the values of the process at time points preceding the outliers' occurrences, the joint CLS estimator of the sizes of the outliers is asymptotically normal.  相似文献   

16.
The stalactite plot for the detection of multivariate outliers   总被引:1,自引:0,他引:1  
Detection of multiple outliers in multivariate data using Mahalanobis distances requires robust estimates of the means and covariance of the data. We obtain this by sequential construction of an outlier free subset of the data, starting from a small random subset. The stalactite plot provides a cogent summary of suspected outliers as the subset size increases. The dependence on subset size can be virtually removed by a simulation-based normalization. Combined with probability plots and resampling procedures, the stalactite plot, particularly in its normalized form, leads to identification of multivariate outliers, even in the presence of appreciable masking.  相似文献   

17.
Mixture regression models are used to investigate the relationship between variables that come from unknown latent groups and to model heterogenous datasets. In general, the error terms are assumed to be normal in the mixture regression model. However, the estimators under normality assumption are sensitive to the outliers. In this article, we introduce a robust mixture regression procedure based on the LTS-estimation method to combat with the outliers in the data. We give a simulation study and a real data example to illustrate the performance of the proposed estimators over the counterparts in terms of dealing with outliers.  相似文献   

18.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

19.
In this article, we propose a new test of discordancy based on spacing theory in circular data. The test should provide a good alternative to existing tests of discordancy for detecting single or well-separated multiple outliers. On top of that, the new method can be generalized to identify a patch of outliers in data. The percentage points are calculated and the performance is examined. We first investigate the performance of the test for detecting a single outlier and show that the new test performs well compared to other known tests. We then show that the generalized test works well in detecting a patch of outliers in the data. As an illustration, a practical example based on an eye dataset obtained from a glaucoma clinic at the University of Malaya Medical Center, Malaysia is presented.  相似文献   

20.
We propose alternative approaches to analyze residuals in binary regression models based on random effect components. Our preferred model does not depend upon any tuning parameter, being completely automatic. Although the focus is mainly on accommodation of outliers, the proposed methodology is also able to detect them. Our approach consists of evaluating the posterior distribution of random effects included in the linear predictor. The evaluation of the posterior distributions of interest involves cumbersome integration, which is easily dealt with through stochastic simulation methods. We also discuss different specifications of prior distributions for the random effects. The potential of these strategies is compared in a real data set. The main finding is that the inclusion of extra variability accommodates the outliers, improving the adjustment of the model substantially, besides correctly indicating the possible outliers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号