期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DETECTION AND TESTING OF DIFFEFENT TYPES OF OUTLIER IN LINEAR STRAUCTURAL RELATIONASHIPS

Vic Barnett 《Australian & New Zealand Journal of Statistics》1985,27(2):151-162

The linear structural model provides one way of modelling a linear relationship between two random variables. It is well known that problems of unidentifiability arise for unreplicated observations and normal error structure. As in all data sets, outliers can arise and methods are needed for detecting and testing them. An outlier-generating model of mean–slippage type can be used to characterise four different forms of outlier manifestation. It is interesting to find that the unidentifiability problem provides no obstacle for detecting or testing the outliers for three of the four forms. Detection principles, and specific discordancy tests, are derived and illustrated by application to some data on physical measurements of Pacific squid. 相似文献

2.

Local influence in multivariate normal data

Myung Geun Kim 《Journal of applied statistics》1996,23(5):535-542

The local influence method introduced by Cook is adapted to multivariate normal data for the purpose of detecting outliers. The method allows simultaneous perturbations on all observations, so that it can identify multiple outliers. An illustrative example is given to show the e ectiveness of the method for the identification of influential observations. 相似文献

3.

Detection of outliers in bivariate time series data

Ravindra Khattree Dayanand N. Naik 《统计学通讯:理论与方法》2013,42(12):3701-3714

In this article, we use the influence function matrix of auto and cross-correlations of a bivariate (multivariate) time series for detecting the outliers. The multivariate analog of the graphical method of Chernick et. al. (1982), to detect outliers and partial outliers is presented. A simulation study illustrating the method is also given. 相似文献

4.

Rao distance as a measure of influence in the multivariate linear model

M. D. Jiménez Gamero J. M. Muñoz Pichardo J. Muñoz García A. Pascual Acosta 《Journal of applied statistics》2002,29(6):841-854

Several methods have been suggested to detect influential observations in the linear regression model and a number of them have been extended for the multivariate regression model. In this article we consider the multivariate general linear model, Y = XB + k , which contains the linear regression model and the multivariate regression model as particular cases. Assuming that the random disturbances are normally distributed, the BLUE of v B is also normally distributed. Since the distribution of the BLUE of v B and the distribution of the BLUE of v B in the model with the omission of a set of observations differ, to study the influence that a set of observations has on the BLUE of v B , we propose to measure the distance between both distributions. To do this we use Rao distance. 相似文献

5.

Identification of local multivariate outliers

Peter Filzmoser Anne Ruiz-Gazen Christine Thomas-Agnan 《Statistical Papers》2014,55(1):29-47

The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory tools are introduced for studying the degree of isolation of an observation from a fraction of its neighbors, and thus to identify local multivariate outliers. 相似文献

6.

Multiple outliers detection in sparse high-dimensional regression

Tao Wang Qun Li Bin Chen 《Journal of Statistical Computation and Simulation》2018,88(1):89-107

The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data. 相似文献

7.

Monitoring multivariate simple linear profiles using robust estimators

Moslem Kordestani Farid Hassanvand Hamid Shahriari 《统计学通讯:理论与方法》2020,49(12):2964-2989

Brief Abstract

This article focuses on estimation of multivariate simple linear profiles. While outliers may hamper the expected performance of the ordinary regression estimators, this study resorts to robust estimators as the remedy of the estimation problem in presence of contaminated observations. More specifically, three robust estimators M, S and MM are employed. Extensive simulation runs show that in the absence of outliers or for small amount of contamination, the robust methods perform as well as the classical least square method, while for medium and large amounts of contamination the proposed estimators perform considerably better than classical method. 相似文献

8.

Robust Detection of Multiple Outliers in Grouped Multivariate Data

Chrys Caroni Nedret Billor 《Journal of applied statistics》2007,34(10):1241-1250

Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution. 相似文献

9.

Robust multivariate diagnostics for PLSR and application on high dimensional spectrally overlapped drug systems

Aylin Alin Claudio Agostinelli Georgi Gergov Plamen Katsarov Yahya Al-Degs 《Journal of Statistical Computation and Simulation》2019,89(6):966-984

ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride. 相似文献

10.

Use of likelihood ratio tests to detect outliers under the variance shift outlier model

Freedom N. Gumedze 《Journal of applied statistics》2019,46(4):598-620

In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset. 相似文献

11.

Robust methods for the analysis of spatially autocorrelated data

Andrea Cerioli Marco Riani 《Statistical Methods and Applications》2002,11(3):335-358

In this paper we propose a new robust technique for the analysis of spatial data through simultaneous autoregressive (SAR) models, which extends the Forward Search approach of Cerioli and Riani (1999) and Atkinson and Riani (2000). Our algorithm starts from a subset of outlier-free observations and then selects additional observations according to their degree of agreement with the postulated model. A number of useful diagnostics which are monitored along the search help to identify masked spatial outliers and high leverage sites. In contrast to other robust techniques, our method is particularly suited for the analysis of complex multidimensional systems since each step is performed through statistically and computationally efficient procedures, such as maximum likelihood. The main contribution of this paper is the development of joint robust estimation of both trend and autocorrelation parameters in spatial linear models. For this purpose we suggest a novel definition of the elemental sets of the Forward Search, which relies on blocks of contiguous spatial locations. 相似文献

12.

Combining Bayesian method and Kalman smoother for detection additive outlier patches in autoregressive time series

Farideh Mohammadinia Rahim Chinipardaz 《统计学通讯:模拟与计算》2013,42(7):2191-2209

ABSTRACT

This article proposes a development of detecting patches of additive outliers in autoregressive time series models. The procedure improves the existing detection methods via Gibbs sampling. We combine the Bayesian method and the Kalman smoother to present some candidate models of outlier patches and the best model with the minimum Bayesian information criterion (BIC) is selected among them. We propose that this combined Bayesian and Kalman method (CBK) can reduce the masking and swamping effects about detecting patches of additive outliers. The correctness of the method is illustrated by simulated data and then by analyzing a real set of observations. 相似文献

13.

Nonlinear regression models for heterogeneous data with massive outliers

Yoonsuh Jung 《Journal of applied statistics》2019,46(8):1456-1477

The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations. 相似文献

14.

Outlier detection in linear models: a comparative study in simple linear regression

Uditha Balasooriya Y.K. Tse 《统计学通讯:理论与方法》2013,42(12):3589-3597

Five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method . The test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression. 相似文献

15.

Outlier detection for multivariate skew-normal data: a comparative study

Y. H. Dovoedo 《Journal of Statistical Computation and Simulation》2013,83(4):773-783

A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario. 相似文献

16.

Detecting outliers: power and some other considerations

Ram B. Jain 《统计学通讯:理论与方法》2013,42(22):2299-2314

The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample 相似文献

17.

The FastHCS algorithm for robust PCA

Eric Schmitt Kaveh Vakili 《Statistics and Computing》2016,26(6):1229-1242

Principal component analysis (PCA) is widely used to analyze high-dimensional data, but it is very sensitive to outliers. Robust PCA methods seek fits that are unaffected by the outliers and can therefore be trusted to reveal them. FastHCS (high-dimensional congruent subsets) is a robust PCA algorithm suitable for high-dimensional applications, including cases where the number of variables exceeds the number of observations. After detailing the FastHCS algorithm, we carry out an extensive simulation study and three real data applications, the results of which show that FastHCS is systematically more robust to outliers than state-of-the-art methods. 相似文献

18.

Outliers in Multi-Response Experiments

Lalmohan Bhar Sankalpa Ojha 《统计学通讯:理论与方法》2014,43(13):2782-2798

Cook-statistic has been developed for detecting outliers in two likely situations of occurrence of outliers in multi-response experiments. In the first situation, more than one outlying observations vector has been considered. Each of these vectors is obtained on the assumption that a particular observation from each of the responses is an outlier. A general expression of Cook-statistic for detecting any such t outlying observations vectors has been obtained. Then some particular cases have been considered. In the second case a situation is considered where observations from all the responses may not be outliers. Here also a general expression of Cook-statistic is obtained for detecting any t observations from each of any k responses as outliers. In both the cases Cook-statistic is applied to real experimental data. 相似文献

19.

Detection of outliers in the multivariate linear regression model

Dayanand N. Naik 《统计学通讯:理论与方法》2013,42(6):2225-2232

In this article we suggest multivariate kurtosis as a statistic for detection of outliers in a multivariate linear regression model. The statistic has some local optimality properties. 相似文献

20.

Simultaneous rank tests for detecting differentially expressed genes

《Journal of Statistical Computation and Simulation》2012,82(5):959-972

Rank tests are known to be robust to outliers and violation of distributional assumptions. Two major issues besetting microarray data are violation of the normality assumption and contamination by outliers. In this article, we formulate the normal theory simultaneous tests and their aligned rank transformation (ART) analog for detecting differentially expressed genes. These tests are based on the least-squares estimates of the effects when data follow a linear model. Application of the two methods are then demonstrated on a real data set. To evaluate the performance of the aligned rank transform method with the corresponding normal theory method, data were simulated according to the characteristics of a real gene expression data. These simulated data are then used to compare the two methods with respect to their sensitivity to the distributional assumption and to outliers for controlling the family-wise Type I error rate, power, and false discovery rate. It is demonstrated that the ART generally possesses the robustness of validity property even for microarray data with small number of replications. Although these methods can be applied to more general designs, in this article the simulation study is carried out for a dye-swap design since this design is broadly used in cDNA microarray experiments. 相似文献