期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Masking and swamping effects on tests for multiple outliers in normal sample

S. M. Bendre 《统计学通讯:理论与方法》2013,42(2):697-710

A study of some commonly used multiple outlier tests in case of normal samples is presented. When the number of outliers in the sample is unknown, two phenomena, namely, the masking and the swamping effect can occur. The performance of the tests is studied using the measures of masking and swamping effects proposed by Bendre and Kale (1985) and Bendre (1985). The effects are illustrated in case of the Murphy test, Tietjen—Moore test and Dixon test. A small simulation study is carried out to indicate these effects. 相似文献

2.

Combining Bayesian method and Kalman smoother for detection additive outlier patches in autoregressive time series

Farideh Mohammadinia Rahim Chinipardaz 《统计学通讯:模拟与计算》2013,42(7):2191-2209

ABSTRACT

This article proposes a development of detecting patches of additive outliers in autoregressive time series models. The procedure improves the existing detection methods via Gibbs sampling. We combine the Bayesian method and the Kalman smoother to present some candidate models of outlier patches and the best model with the minimum Bayesian information criterion (BIC) is selected among them. We propose that this combined Bayesian and Kalman method (CBK) can reduce the masking and swamping effects about detecting patches of additive outliers. The correctness of the method is illustrated by simulated data and then by analyzing a real set of observations. 相似文献

3.

Distance-based outlier detection for high dimension,low sample size data

Jeongyoun Ahn Myung Hee Lee Jung Ae Lee 《Journal of applied statistics》2019,46(1):13-29

Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers. 相似文献

4.

A comparison of two boxplot methods for detecting univariate outliers which adjust for sample size and asymmetry

Nancy J. Carter Neil C. Schwertman Terry L. Kiser 《Statistical Methodology》2009,6(6):604-621

It is important to identify outliers since inclusion, especially when using parametric methods, can cause distortion in the analysis and lead to erroneous conclusions. One of the easiest and most useful methods is based on the boxplot. This method is particularly appealing since it does not use any outliers in computing spread. Two methods, one by Carling and another by Schwertman and de Silva, adjust the boxplot method for sample size and skewness. In this paper, the two procedures are compared both theoretically and by Monte Carlo simulations. Simulations using both a symmetric distribution and an asymmetric distribution were performed on data sets with none, one, and several outliers. Based on the simulations, the Carling approach is superior in avoiding masking outliers, that is, the Carling method is less likely to overlook an outlier while the Schwertman and de Silva procedure is much better at reducing swamping, that is, misclassifying an observation as an outlier. Carling’s method is to the Schwertman and de Silva procedure as comparisonwise versus experimentwise error rate is for multiple comparisons. The two methods, rather than being competitors, appear to complement each other. Used in tandem they provide the data analyst a more complete prospective for identifying possible outliers. 相似文献

5.

Identification of Multiple Outliers in Logistic Regression

A. H. M. Rahmatullah Imon Ali S. Hadi 《统计学通讯:理论与方法》2013,42(11):1697-1709

The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples. 相似文献

6.

The stalactite plot for the detection of multivariate outliers 总被引：1，自引：0，他引：1

A. C. Atkinson H.-M. Mulira 《Statistics and Computing》1993,3(1):27-35

Detection of multiple outliers in multivariate data using Mahalanobis distances requires robust estimates of the means and covariance of the data. We obtain this by sequential construction of an outlier free subset of the data, starting from a small random subset. The stalactite plot provides a cogent summary of suspected outliers as the subset size increases. The dependence on subset size can be virtually removed by a simulation-based normalization. Combined with probability plots and resampling procedures, the stalactite plot, particularly in its normalized form, leads to identification of multivariate outliers, even in the presence of appreciable masking. 相似文献

7.

Unmasking test for multiple upper or lower outliers in normal samples

Jin Zhang Xueren Wang 《Journal of applied statistics》1998,25(2):257-261

SUMMARY The discordancy test for multiple outliers is complicated by problems of masking and swamping. The key to the settlement of the question lies in the determination of k , i.e. the number of 'contaminants' in a sample. Great efforts have been made to solve this problem in recent years, but no effective method has been developed. In this paper, we present two ways of determining k , free from the effects of masking and swamping, when testing upper (lower) outliers in normal samples. Examples are given to illustrate the methods. 相似文献

8.

Identification and classification of multiple outliers,high leverage points and influential observations in linear regression

A.A.M. Nurunnabi M. Nasser A.H.M.R. Imon 《Journal of applied statistics》2016,43(3):509-525

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects. 相似文献

9.

Cluster-based multivariate outlier identification and re-weighted regression in linear models

Ekele Alih Hong Choon Ong 《Journal of applied statistics》2015,42(5):938-955

A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping. 相似文献

10.

New HEAVY Models for Fat-Tailed Realized Covariances and Returns

Anne Opschoor Pawel Janus André Lucas Dick Van Dijk 《商业与经济统计学杂志》2013,31(4):643-657

ABSTRACT

We develop a new score-driven model for the joint dynamics of fat-tailed realized covariance matrix observations and daily returns. The score dynamics for the unobserved true covariance matrix are robust to outliers and incidental large observations in both types of data by assuming a matrix-F distribution for the realized covariance measures and a multivariate Student's t distribution for the daily returns. The filter for the unknown covariance matrix has a computationally efficient matrix formulation, which proves beneficial for estimation and simulation purposes. We formulate parameter restrictions for stationarity and positive definiteness. Our simulation study shows that the new model is able to deal with high-dimensional settings (50 or more) and captures unobserved volatility dynamics even if the model is misspecified. We provide an empirical application to daily equity returns and realized covariance matrices up to 30 dimensions. The model statistically and economically outperforms competing multivariate volatility models out-of-sample. Supplementary materials for this article are available online. 相似文献

11.

Unbiased Estimator for a Covariance Matrix Under Two-Step Monotone Incomplete Sample

Shin-Ichi Tsukada 《统计学通讯:理论与方法》2014,43(8):1613-1629

In this article, we consider an inference for a covariance matrix under two-step monotone incomplete sample. The maximum likelihood estimator of the mean vector is unbiased but that of the covariance matrix is biased. We derive an unbiased estimator for the covariance matrix using some fundamental properties of the Wishart matrix. The properties of the estimators are investigated and the accuracies are checked by a numerical simulation. 相似文献

12.

Influence diagnostics for the structural errors-in-variables model under the Student-t distribution 总被引：1，自引：1，他引：0

Manuel Galea Heleno Bolfarine Filidor Vilcalabra 《Journal of applied statistics》2002,29(8):1191-1204

The influence of observations on the parameter estimates for the simple structural errors-in-variables model with no equation error, under the Student-t distribution, is investigated using the local influence approach. The main conclusion is that the Student-t model with small degrees of freedom is able to incorporate possible outliers and influential observations in the data. The likelihood displacement approach is useful for outlier detection, especially when a masking phenomenon is present and the degrees of freedom parameter is large. The diagnostics are illustrated with two examples. 相似文献

13.

A clustering approach to detect multiple outliers in linear functional relationship model for circular data

Nurkhairany Amyra Mokhtar Abdul Ghapor Hussin 《Journal of applied statistics》2018,45(6):1041-1051

Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability. 相似文献

14.

The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression

M. Habshah M. R. Norazan A. H.M. Rahmatullah Imon 《Journal of applied statistics》2009,36(5):507-520

Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations. 相似文献

15.

Density-Tempered Marginalized Sequential Monte Carlo Samplers

Jin-Chuan Duan Andras Fulop 《商业与经济统计学杂志》2013,31(2):192-202

We propose a density-tempered marginalized sequential Monte Carlo (SMC) sampler, a new class of samplers for full Bayesian inference of general state-space models. The dynamic states are approximately marginalized out using a particle filter, and the parameters are sampled via a sequential Monte Carlo sampler over a density-tempered bridge between the prior and the posterior. Our approach delivers exact draws from the joint posterior of the parameters and the latent states for any given number of state particles and is thus easily parallelizable in implementation. We also build into the proposed method a device that can automatically select a suitable number of state particles. Since the method incorporates sample information in a smooth fashion, it delivers good performance in the presence of outliers. We check the performance of the density-tempered SMC algorithm using simulated data based on a linear Gaussian state-space model with and without misspecification. We also apply it on real stock prices using a GARCH-type model with microstructure noise. 相似文献

16.

A Perturbation Approach to Outlier Detection in Two-Way Contingency Tables

Andy H. Lee & John S. Yick 《Australian & New Zealand Journal of Statistics》1999,41(3):305-315

In order to identify outliers in contingency tables, we evaluate the derivatives of the perturbation-formed surface of the Pearson goodness-of-fit statistic. The resulting diagnostics are shown to be less susceptible to masking and swamping problems than residual-based measures. A Monte Carlo study further confirms the effectiveness of the proposed diagnostics. 相似文献

17.

A simple diagnostic method of outlier detection for stationary Gaussian time series 总被引：1，自引：0，他引：1

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(2):205-223

In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed. 相似文献

18.

Outliers and influential observations in the structural errors-in-variables model

Myung Geun Kim 《Journal of applied statistics》2000,27(4):451-460

The influence of observations on the parameter estimates for the simple structural errors-in-variables model with no equation error is investigated using the local influence method. Residuals themselves are not sufficient for detecting outliers. The likelihood displacement approach is useful for outlier detection especially when a masking phenomenon is present. An illustrative example is provided. 相似文献

19.

Testing Inference in Inflated Beta Regressions under Model Misspecification

Tatiene C. Souza Tarciana L. Pereira Francisco Cribari-Neto Verônica M. C. Lima 《统计学通讯:模拟与计算》2016,45(2):625-642

We consider testing inference in inflated beta regressions subject to model misspecification. In particular, quasi-z tests based on sandwich covariance matrix estimators are described and their finite sample behavior is investigated via Monte Carlo simulations. The numerical evidence shows that quasi-z testing inference can be considerably more accurate than inference made through the usual z tests, especially when there is model misspecification. Interval estimation is also considered. We also present an empirical application that uses real (not simulated) data. 相似文献

20.

Robust estimation of mean and covariance for longitudinal data with dropouts

Guoyou Qin 《Journal of applied statistics》2015,42(6):1240-1254

In this paper, we study estimation of linear models in the framework of longitudinal data with dropouts. Under the assumptions that random errors follow an elliptical distribution and all the subjects share the same within-subject covariance matrix which does not depend on covariates, we develop a robust method for simultaneous estimation of mean and covariance. The proposed method is robust against outliers, and does not require to model the covariance and missing data process. Theoretical properties of the proposed estimator are established and simulation studies show its good performance. In the end, the proposed method is applied to a real data analysis for illustration. 相似文献