期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A class of residuals for outlier identification in zero adjusted regression models

Gustavo H. A. Pereira Juliana Scudilio Manoel Santos-Neto Denise A. Botter Mnica C. Sandoval 《Journal of applied statistics》2020,47(10):1833

Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we introduce a class of residuals for outlier identification in zero adjusted regression models. Monte Carlo simulation studies and two applications suggest that one of the residuals of the class introduced here has good properties and detects outliers that are not identified by the randomized quantile residual. 相似文献

2.

基于Bayes后验概率的自变量与异常点的同时识别

下载免费PDF全文

王康宁汪四水《统计研究》2012,29(1):31-37

本文基于自变量与异常点识别隐变量的联合Bayes后验概率,给出了自变量与异常点同时识别的一般方法,且利用Gibbs抽样降低了Bayes后验概率的计算复杂度。其次,针对多值序次数据模型自变量与异常点的同时识别展开详细讨论,给出了同时识别的具体过程。最后通过模拟算例展示了本文方法的有效性。相似文献

3.

Detection of outliers in multilevel models

Lei Shi Gemai Chen 《Journal of statistical planning and inference》2008

This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models. 相似文献

4.

A simple diagnostic method of outlier detection for stationary Gaussian time series 总被引：1，自引：0，他引：1

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(2):205-223

In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed. 相似文献

5.

Use of likelihood ratio tests to detect outliers under the variance shift outlier model

Freedom N. Gumedze 《Journal of applied statistics》2019,46(4):598-620

In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset. 相似文献

6.

Detection of outliers in mixed regressive-spatial autoregressive models

Libin Jin Xiaowen Dai Anqi Shi 《统计学通讯:理论与方法》2013,42(17):5179-5192

ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions. 相似文献

7.

The calculation of outlier detection statistics

Jeffrey S. Simonoff 《统计学通讯:模拟与计算》2013,42(2):275-285

This paper presents a routine that calculates four outlier detection statistics. The routine determines a series of points that are identified as possible outliers, and calculates the values that can be used to test them. These values can be used in an iterative procedure to detect multiple outliers. 相似文献

8.

Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties

Xin Dang Robert Serfling 《Journal of statistical planning and inference》2010,140(1):275

In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold. 相似文献

9.

Robust automatic methods for outlier and error detection

Ray Chambers Adão Hentges Xinqiang Zhao 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(2):323-339

Summary. Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described. 相似文献

10.

Robust sparse regression and tuning parameter selection via the efficient bootstrap information criteria

《Journal of Statistical Computation and Simulation》2012,82(7):1596-1607

There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L₁-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers. 相似文献

11.

Bayesian outlier testing using the predictive distribution for a linear model op constant intraclass form

Barry K. Noser Virgil R. Marco 《统计学通讯:理论与方法》2013,42(3):849-860

The problem of testing suspected outliers from a linear model with constant intraclass correlation is considered from a Bayesian viewpoint. The main objective of this paper is to develop an outlier test procedure based on the predictive distribution of suspected outlier observations given a set of existing inlier observations. The test procedure is easily performed with the usual F and t distributions. 相似文献

12.

Effects of a single outlier on arma identification

Stuart J. Deutsch Jeery E. Richards James J. Swain 《统计学通讯:理论与方法》2013,42(6):2207-2227

Fox (1972), Box and Tiao (1975), and Abraham and Box (1979) have proposed methods for detecting outliers in time series whose ARMA form is known (or identified). We show that the existence of a single aberrant observation, innovation, or intervention causes an ARMA model to be misidentified using unadjusted autocorrelation (acf) and partial autocorrelation estimates. The magnitude, location, type of outlier, and in some cases the ARMA's parameters, affect the identification outcome. We use variance inflation, signal-to-noise ratios, and acf critical values to determine an ARMA model's susceptibility to misidentifi-cation. Numerical and simulation examples suggest how to iteratively use the outlier detection methods in practice. 相似文献

13.

Outlier detection in high-dimensional regression model

Tao Wang 《统计学通讯:理论与方法》2017,46(14):6947-6958

An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods. 相似文献

14.

Distance-based outlier detection for high dimension,low sample size data

Jeongyoun Ahn Myung Hee Lee Jung Ae Lee 《Journal of applied statistics》2019,46(1):13-29

Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers. 相似文献

15.

Bayesian robust transformation and variable selection: A unified approach

Raphael Gottardo Adrian Raftery 《Revue canadienne de statistique》2009,37(3):361-380

The authors consider the problem of simultaneous transformation and variable selection for linear regression. They propose a fully Bayesian solution to the problem, which allows averaging over all models considered including transformations of the response and predictors. The authors use the Box‐Cox family of transformations to transform the response and each predictor. To deal with the change of scale induced by the transformations, the authors propose to focus on new quantities rather than the estimated regression coefficients. These quantities, referred to as generalized regression coefficients, have a similar interpretation to the usual regression coefficients on the original scale of the data, but do not depend on the transformations. This allows probabilistic statements about the size of the effect associated with each variable, on the original scale of the data. In addition to variable and transformation selection, there is also uncertainty involved in the identification of outliers in regression. Thus, the authors also propose a more robust model to account for such outliers based on a t‐distribution with unknown degrees of freedom. Parameter estimation is carried out using an efficient Markov chain Monte Carlo algorithm, which permits moves around the space of all possible models. Using three real data sets and a simulated study, the authors show that there is considerable uncertainty about variable selection, choice of transformation, and outlier identification, and that there is advantage in dealing with all three simultaneously. The Canadian Journal of Statistics 37: 361–380; 2009 © 2009 Statistical Society of Canada 相似文献

16.

Regression with outlier shrinkage

Shifeng Xiong V. Roshan Joseph 《Journal of statistical planning and inference》2013

We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>p

n > p

cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods. 相似文献

17.

Rapid penalized likelihood-based outlier detection via heteroskedasticity test

Yunquan Song Ping Dong Xiuli Wang Lu Lin 《Journal of Statistical Computation and Simulation》2017,87(6):1206-1229

Outlier detection is fundamental to statistical modelling. When there are multiple outliers, many traditional approaches in use are stepwise detection procedures, which can be computationally expensive and ignore stochastic error in the outlier detection process. Outlier detection can be performed by a heteroskedasticity test. In this article, a rapid outlier detection method via multiple heteroskedasticity test based on penalized likelihood approaches is proposed to handle these kinds of problems. The proposed method detects the heteroskedasticity of all data only by one step and estimate coefficients simultaneously. The proposed approach is distinguished from others in that a rapid modelling approach uses a weighted least squares formulation coupled with nonconvex sparsity-including penalization. Furthermore, the proposed approach does not need to construct test statistics and calculate their distributions. A new algorithm is proposed for optimizing penalized likelihood functions. Favourable theoretical properties of the proposed approach are obtained. Our simulation studies and real data analysis show that the newly proposed methods compare favourably with other traditional outlier detection techniques. 相似文献

18.

An outlier problem in the determination of ore grade

Brenton R. Clarke Toby Lewis 《Journal of applied statistics》1998,25(6):751-762

Data from recordings of ore assays from the Western Australian goldfields provide motivation to devise new tests for outliers when observations are distributed with the same mean but diff ering variances. In the case of equal variances, tests for a single outlier reduce to well-known tests of discordancy. A block discordancy test for k outliers is also described. The question of whether or not one should omit any observation(s) in the calculation of the mean recoverable gold content is addressed in the context of whether or not the data contain outliers, as judged by a normal model for the 'logged' ore assay values. The given data suggest that models with 'logged' values that follow long-tailed approximately normal distributions may be appropriate. 相似文献

19.

Generalised Rank Regression Estimator with Standard Error Adjusted Lasso

下载免费PDF全文

A.S. Turkmen O. Ozturk 《Australian & New Zealand Journal of Statistics》2016,58(1):121-135

One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set. 相似文献

20.

An outlier detection scheme for dynamical sequential datasets

Shiliang Zhang Zonglin Ye Yanbin Zhang Xiali Hei 《统计学通讯:模拟与计算》2019,48(5):1450-1502

Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2^N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data. 相似文献