期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detection of outliers in growth curve models

Madhusudan Bhandary 《统计学通讯:理论与方法》2013,42(8):1923-1940

The growth curve model introduced by Potthoff and Roy (1964) is a general statistical model which includes as special cases regression models and both univariate and multivariate analysis of variance models. In this paper, we discuss procedures for detection of outliers in growth curve models for mean-slippage and dispersion-slippage outlier model. The distributions of the test statistics are discussed and the values of significant probabilities are given using Bonferronl's bounds. Some simulation results are also presented. 相似文献

2.

Detecting outliers: power and some other considerations

Ram B. Jain 《统计学通讯:理论与方法》2013,42(22):2299-2314

The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample 相似文献

3.

A unified approach for estimation and detection of outliers

Ram B. Jain Louis A. Pingel John L. Davidson 《统计学通讯:理论与方法》2013,42(25):2953-2976

Two two-stage procedures RD_M-ESD and RD_M-KUR to detect outliers from normal samples are considered. Their powers for n=20 (10)60, k=2(1)(n/5) are computed and compared. Percentage points of these procedures are presented.Two examples to illustrate the use of these procedures are also given. 相似文献

4.

Detection of the numbers of outliers present in a data set using an information theoretic criterion

Bhandary Madhusudan 《统计学通讯:理论与方法》2013,42(11):3263-3274

We use an information theoretic criterion proposed by Zhao, Krishnaiah and Bai (1986) to detect the number of outliers in a data set. We consider univariable mean-slippage and dispersion-slippage outlier structure of the observations. Multivariate generalizations and the consistency of the estimates are also considered. Numerical examples are presented in tables. 相似文献

5.

A procedure for estimating the number of outliers

Ram B. Ja´ Louís A. Pingel 《统计学通讯:理论与方法》2013,42(10):1029-1041

A new type of procedure for estimating the number of outliers in a sample is presented and compared with existing procedures. The probabilities of exact, under-, and overestimation with the different procedures are examined for two different contamination schemes. 相似文献

6.

Effects of level shifts and temporary changes on the estimation of GARCH models

《Journal of Statistical Computation and Simulation》2012,82(6):667-688

The aim of this article is to analyse the effect of the level shift and temporary change outliers on the estimation of a model with conditional heteroscedasticity, a concept rarely dealt with up to now, the literature focusing more on additive outliers. To do this, we have conducted various Monte Carlo experiments in which the bias produced by these outliers is analysed. 相似文献

7.

Outlier detection in linear models: a comparative study in simple linear regression

Uditha Balasooriya Y.K. Tse 《统计学通讯:理论与方法》2013,42(12):3589-3597

Five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method . The test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression. 相似文献

8.

Test for multiple upper outliers in an exponential sample irrespective of origin

Nirpeksh Kumar 《Statistics》2013,47(1):184-190

An approach for testing multiple upper outliers with slippage alternative in an exponential sample, irrespective of origin, is discussed. The outlier detection procedure is based on a ratio of two estimates, obtained by the maximization of the two log-likelihood functions. One is the complete data log-likelihood and the other is its conditional expectation, given the regular observations. The exact null distribution of the test statistic is derived and no new table for critical values is required. A simulation study is also carried out to compare the performance of the test with the earlier work. 相似文献

9.

A two-step robust estimation of the process mean using M-estimator

Hamid Shahriari Orod Ahmadi Amir H. Shokouhi 《Journal of applied statistics》2011,38(6):1289-1301

Parameter estimation is the first step in constructing control charts. One of these parameters is the process mean. The classical estimators of the process mean are sensitive to the presence of outlying data and subgroups which contaminate the whole data. In existing robust estimators for the process mean, the effects of the presence of the individual outliers are being considered, while, in this paper, a robust estimator is being proposed to reduce the effect of outlying subgroups as well as the individual outliers within a subgroup. The proposed estimator was compared with some classical and robust estimators of the process mean. Although, its relative efficiency is fourth among the estimators tested, its robustness and efficiency are large when the outlying subgroups are present. Evaluation of the results indicated that the proposed estimator is less sensitive to the presence of outliers and the process mean performs well when there are no individual outliers or outlying subgroups. 相似文献

10.

Table of percentage points of ratios of linear combinations of order statistics of samples from exponential distributions

S. A. Patil J. L. Kovner Rudy M. King 《统计学通讯:模拟与计算》2013,42(2):115-136

A form of the distribution function of ratios of linear combinations of order statistics of samples from an exponential distribution is given. From the distribution, tables of percentage points of the statistic for α = .05, .95, and n = 3(1)50, and for censoring up to five observations are presented. Use of the tables is made to find critical values of the most powerful scale and location invariant test of exponentiality against uniformity, and also to find critical values for a test of outliers in an exponential population. 相似文献

11.

Additive Outlier Detection and Estimation for the Logarithmic Autoregressive Conditional Duration Model

Min-Hsien Chiang 《统计学通讯:模拟与计算》2013,42(3):287-301

This study investigates the influences of additive outliers on financial durations. An outlier test statistic and an outlier detection procedure are proposed to detect and estimate outlier effects for the logarithmic Autoregressive Conditional Duration (Log-ACD) model. The proposed test statistic has an exact sampling distribution and performs very well, in terms of size and power, in a series of Monte Carlo simulations. Furthermore, the test statistic is robust to several alternative distribution assumptions. An empirical application shows that parameter estimates without considering outliers tend to be biased. 相似文献

12.

Bayesian outlier testing using the predictive distribution for a linear model op constant intraclass form

Barry K. Noser Virgil R. Marco 《统计学通讯:理论与方法》2013,42(3):849-860

The problem of testing suspected outliers from a linear model with constant intraclass correlation is considered from a Bayesian viewpoint. The main objective of this paper is to develop an outlier test procedure based on the predictive distribution of suspected outlier observations given a set of existing inlier observations. The test procedure is easily performed with the usual F and t distributions. 相似文献

13.

Outlier detection for multivariate skew-normal data: a comparative study

Y. H. Dovoedo 《Journal of Statistical Computation and Simulation》2013,83(4):773-783

A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario. 相似文献

14.

Analysis of seasonal level shift (SLS) detection in SARIMA models

Zahid Asghar 《统计学通讯:模拟与计算》2017,46(9):7264-7318

This study aims at exploring correct identification of seasonal outliers using most commonly applied test statistics. We evaluate the performance of seasonal level shift (SLS) by means of empirical level of significance, power of the test for sensitivity in detecting changes, and the vulnerability to masking of outliers by misspecification frequencies. We observe that the size of SLS affects the sampling distribution of η_SLS (test statistics for SLS detection) in case of SAR (1) and SMA (1) model. The empirical critical values for 1%, 5%, and 10% upper percentiles are higher than the usual cut off points and the empirical level of significance is inversely related to sample size and the model coefficients. The empirical power of the test statistics is not satisfactory at small sample size, and for large model coefficient. η_SLS gets confused with IO. The potential list of types of outliers should retain both IO and SLS as a part of outlier detection procedure for most efficient results. We apply the method suggested by Kaiser and Maravall with five possible types of outliers, that is, AO, IO, LS, TC, and SLS, to a number of quarterly and monthly time series data from Pakistan. 相似文献

15.

A note on determining the number of outliers in an exponential sample by least squares procedure

Jong-Wuu Wu 《Statistical Papers》2001,42(4):489-503

In this paper, we suggest a least squares procedure for the determination of the number of upper outliers in an exponential sample by minimizing sample mean squared error. Moreover, the method can reduce the masking or “swamping” effects. In addition, we have also found that the least squares procedure is easy and simple to compute than test test procedure T _k suggested by Zhang (1998) for determining the number of upper outliers, since Zhang (1998) need to use the complicated null distribution of T _k. Moreover, we give three practical examples and a simulated example to illustrate the procedures. Further, simulation studies are given to show the advantages of the proposed method. Finally, the proposed least squares procedure can also determine the number of upper outliers in other continuous univariate distributions (for example, Pareto, Gumbel, Weibull, etc.). Received: May 10, 1999; revised version: June 5, 2000 相似文献

16.

An outlier detection scheme for dynamical sequential datasets

Shiliang Zhang Zonglin Ye Yanbin Zhang Xiali Hei 《统计学通讯:模拟与计算》2019,48(5):1450-1502

Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2^N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data. 相似文献

17.

Use of likelihood ratio tests to detect outliers under the variance shift outlier model

Freedom N. Gumedze 《Journal of applied statistics》2019,46(4):598-620

In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset. 相似文献

18.

Outlier identification and robust parameter estimation in a zero-inflated Poisson model

Jun Yang Min Xie Thong Ngee Goh 《Journal of applied statistics》2011,38(2):421-430

The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates. 相似文献

19.

Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model

Sung-Soo Kim Sung H. Park W. J. Krzanowski 《Journal of applied statistics》2008,35(3):283-291

We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification. 相似文献

20.

Outlier detection for high dimensional data using the Comedian approach

《Journal of Statistical Computation and Simulation》2012,82(5):745-757

The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance. 相似文献