首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset.  相似文献   

2.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

3.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

4.
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

5.
Spatial outliers are spatially referenced objects whose non spatial attribute values are significantly different from the corresponding values in their spatial neighborhoods. In other words, a spatial outlier is a local instability or an extreme observation that deviates significantly in its spatial neighborhood, but possibly not be in the entire dataset. In this article, we have proposed a novel spatial outlier detection algorithm, location quotient (LQ) for multiple attributes spatial datasets, and compared its performance with the well-known mean and median algorithms for multiple attributes spatial datasets, in the literature. In particular, we have applied the mean, median, and LQ algorithms on a real dataset and on simulated spatial datasets of 13 different sizes to compare their performances. In addition, we have calculated area under the curve values in all the cases, which shows that our proposed algorithm is more powerful than the mean and median algorithms in almost all the considered cases and also plotted receiver operating characteristic curves in some cases.  相似文献   

6.
In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers.  相似文献   

7.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

8.
When the individual measurements are statistically independent, the maximum likelihood estimator calculated at the end of a sequential procedure overestimates the underlying effect. There are many clinical trials in which we are interested in comparing changes in responses between two treatment groups sequentially. Lee and DeMets (1991, JASA 86, 757–762) proposed a group sequential method for comparing rates of change when a response variable is measured for eaeh patient at successive follow-up visits. They assumed that the response follows the linear mixed effects model and derived the asymptotic joint distribution of the sequentially computed statistics. In this article, we consider the maximum likelihood estimator (MLE), the median unbiased estimator (MUE) and the midpoint of a 100(1-α)% confidence interval as point estimators for the rate of change in the linear mixed effects model, and investigate their properties by Monte Carlo simulation.  相似文献   

9.
This article provides a procedure for the detection and identification of outliers in the spectral domain where the Whittle maximum likelihood estimator of the panel data model proposed by Chen [W.D. Chen, Testing for spurious regression in a panel data model with the individual number and time length growing, J. Appl. Stat. 33(88) (2006b), pp. 759–772] is implemented. We extend the approach of Chang and co-workers [I. Chang, G.C. Tiao, and C. Chen, Estimation of time series parameters in the presence of outliers, Technometrics 30 (2) (1988), pp. 193–204] to the spectral domain and through the Whittle approach we can quickly detect and identify the type of outliers. A fixed effects panel data model is used, in which the remainder disturbance is assumed to be a fractional autoregressive integrated moving-average (ARFIMA) process and the likelihood ratio criterion is obtained directly through the modified inverse Fourier transform. This saves much time, especially when the estimated model implements a huge data-set.

Through Monte Carlo experiments, the consistency of the estimator is examined by growing the individual number N and time length T, in which the long memory remainder disturbances are contaminated with two types of outliers: additive outlier and innovation outlier. From the power tests, we see that the estimators are quite successful and powerful.

In the empirical study, we apply the model on Taiwan's computer motherboard industry. Weekly data from 1 January 2000 to 31 October 2006 of nine familiar companies are used. The proposed model has a smaller mean square error and shows more distinctive aggressive properties than the raw data model does.  相似文献   


10.
The problem of sequential sampling without recall from a Dirichlet process is studied. An individual observes a sequential sample p1, p2, … . At each stage the observer can either accept the current value pi or continue sampling. The total cost to the observer is the current value plus a cost for each observation taken after the first. A concrete context for the problem is provided by a shopper who must buy a particular item and can elicit price quotations sequentially but must pay for each quotation. Qualitative properties of optimal search rules are derived which establish that the problem is well behaved. In particular, the reservation-price property is shown to hold. The results extend those in Christensen (1986).  相似文献   

11.
This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

12.
For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.  相似文献   

13.
ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions.  相似文献   

14.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

15.
As the Watson distribution is frequently used for modeling axial data, it is important to investigate the existence of possible outliers in samples from this distribution. Then, we develop for the bipolar Watson distribution defined on the hypersphere, some tests of discordancy of an outlier or several outliers en bloc based on the likelihood ratio, supposing an alternative model of contamination of slippage type. We evaluate the performance of these tests of discordancy of an outlier and we also compare some tests of discordancy of an outlier available for this distribution.  相似文献   

16.
The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample  相似文献   

17.
To overcome the main flaw of minimum covariance determinant (MCD) estimator, i.e. difficulty to determine its main parameter h, a modified-MCD (M-MCD) algorithm is proposed. In M-MCD, the self-adaptive iteration is proposed to minimize the deflection between the standard deviation of robust mahalanobis distance square, which is calculated by MCD with the parameter h based on the sample, and the standard deviation of theoretical mahalanobis distance square by adjusting the parameter h of MCD. Thus, the optimal parameter h of M-MCD is determined when the minimum deflection is obtained. The results of convergence analysis demonstrate that M-MCD has good convergence property. Further, M-MCD and MCD were applied to detect outliers for two typical data and chemical process data, respectively. The results show that M-MCD can get the optimal parameter h by using the self-adaptive iteration and thus its performances of outlier detection are better than MCD.  相似文献   

18.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

19.
Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration‐response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow‐up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice.  相似文献   

20.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号