首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data.  相似文献   

2.
The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample  相似文献   

3.

The Mallows-type estimator, one of the most reasonable bounded influence estimators, often downweights leverage points regardless of the magnitude of the corresponding residual, and this could imply a loss of efficiency. In this article, we consider whether the efficiency of this bounded influence estimator could be improved by regarding both the robust x -distance and the residual size. We develop a new robust procedure based on the ideas of the Mallows-type estimator and the general robust recipe, where data been cleaned by pulling outliers towards their fitted values. Our basic idea is to formulate the robust estimation as an allocation problem, where the objective function is a Huber-type "loss" function, but the pulling resource is restricted. Using a mathematical programming technique, the pulling resource is optimally allocated to influential points <$>({x}_i, y_i)<$> with respect to residual size and given weights, <$>w({x}_i)<$>. Three previously published approaches are compared to our proposal via simulated experiments. In the case of contaminated data by regression outliers and "good" leverage points, the proposed robust estimator is a reasonable bounded influence estimator concerning both efficiency and norm of bias. In addition, the proposed approach offers the potential to establish constraints for the regression parameters and also may potentially provide insight regarding outlier detection.  相似文献   

4.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

5.
ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions.  相似文献   

6.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

7.
Although the poor performance of the mean as a location estimate when outliers are present in the data is well-known, there has b.een no clear consensus as to whether robust estimation or outlier detection Is the appropriate corrective procedure. In this paper, the estimation accuracy of the sample mean and 27 robust estimation and outlier detection techniques are compared by computer simulation. Both symmetric and asymmetric contamination are considered, It Is shown that the proper class of estimates depends on the degree of contaminations whether the contamination is symmetric or asymmetric, and the sample size. Several data sets considered previously by Rocke et.al. (1982) are also examined.  相似文献   

8.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

9.
In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset.  相似文献   

10.
Let F(x) and F(x+θ) be log dose-response curves for a standard preparation and a test preparation, respectively, in a parallel quantal bioassay designed to test the relative potency of a drug, toxicant, or some other substance, and suppose the form of F is unknown. Several estimators of the shift parameter θ or relative potency, are compared, including some generalized and trimmed Spearman-Kärber estimators and a non parametric maximum likelihood estimator. Both point and interval estimation are discussed. Some recommendations concerning the choices of estimators are offered.  相似文献   

11.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

12.
Test procedures on outlier detection problems for Gumbel distribution are rarely available. Hence, a test statistic is proposed here for detection of a pair of upper and lower outliers from a Gumbel distribution with known scale parameter. The critical values of the statistic are obtained and some examples are also given to highlight the use of the statistic. The advantage of the proposed statistic is that the scale parameter, though assumed to be known is not explicitly involved in the determination of the critical values.  相似文献   

13.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

14.
Nirpeksh Kumar 《Statistics》2013,47(1):184-190
An approach for testing multiple upper outliers with slippage alternative in an exponential sample, irrespective of origin, is discussed. The outlier detection procedure is based on a ratio of two estimates, obtained by the maximization of the two log-likelihood functions. One is the complete data log-likelihood and the other is its conditional expectation, given the regular observations. The exact null distribution of the test statistic is derived and no new table for critical values is required. A simulation study is also carried out to compare the performance of the test with the earlier work.  相似文献   

15.
This work studies outlier detection and robust estimation with data that are naturally distributed into groups and which follow approximately a linear regression model with fixed group effects. For this, several methods are considered. First, the robust fitting method of Peña and Yohai [A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc. 1999;94:434–445], called principal sensitivity components (PSC) method, is adapted to the grouped data structure and the mentioned model. The robust methods RDL1 of Hubert and Rousseeuw [Robust regression with both continuous and binary regressors. J Stat Plan Inference. 1997;57:153–163] and M-S of Maronna and Yohai [Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 2000;89:197–214] are also considered. These three methods are compared in terms of their effectiveness in outlier detection and their robustness through simulations, considering several contamination scenarios and growing contamination levels. Results indicate that the adapted PSC procedure is able to detect a high percentage of true outliers and a small number of false outliers. It is appropriate when the contamination is in the error term or in the covariates, detecting also possibly masked high leverage points. Moreover, in simulations the final robust regression estimator preserved good efficiency under Normality while keeping good robustness properties.  相似文献   

16.
Similarity in bioassays means that the test preparation behaves as a dilution of the standard preparation with respect to their biological effect. Thus, similarity must be investigated to confirm this biological property. Historically, this was typically conducted with traditional hypothesis testing, but this has received substantial criticism. Failing to reject similarity does not imply that the 2 preparations are similar. Also, rejecting similarity when bioassay variability is small might simply demonstrate a nonrelevant deviation in similarity. To remedy these concerns, equivalence testing has been proposed as an alternative to traditional hypothesis testing, and it has found its way in the official guidelines. However, similarity has been discussed mainly in terms of the parameters in the dose‐response curves of the standard and test preparations, but the consequences of nonsimilarity on the relative bioactivity have never been investigated. This article provides a general equivalence approach to evaluate similarity that is directly related to bioequivalence on the relative bioactivity of the standard and test preparations. Bioequivalence on the relative bioactivity can only be guaranteed for positive (only nonblanks) and finite dose intervals. The approach is demonstrated on 4 case studies in which we also show how to calculate a sample size and how to investigate the power of equivalence on similarity.  相似文献   

17.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

18.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

19.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

20.
This paper deals with a formal identification of outliers in regression based on tests of hypotheses. The hypothesis is not the standard one but is based on performance criteria that relates to the coefficient estimation and predictive capabilities of the model. The cri-teria include the trace of the mean square error matrix on the coefficients and integrated mean square error of prediction. Both the mean shift outlier model and the variance in-flation model are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号