首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Outlier detection is fundamental to statistical modelling. When there are multiple outliers, many traditional approaches in use are stepwise detection procedures, which can be computationally expensive and ignore stochastic error in the outlier detection process. Outlier detection can be performed by a heteroskedasticity test. In this article, a rapid outlier detection method via multiple heteroskedasticity test based on penalized likelihood approaches is proposed to handle these kinds of problems. The proposed method detects the heteroskedasticity of all data only by one step and estimate coefficients simultaneously. The proposed approach is distinguished from others in that a rapid modelling approach uses a weighted least squares formulation coupled with nonconvex sparsity-including penalization. Furthermore, the proposed approach does not need to construct test statistics and calculate their distributions. A new algorithm is proposed for optimizing penalized likelihood functions. Favourable theoretical properties of the proposed approach are obtained. Our simulation studies and real data analysis show that the newly proposed methods compare favourably with other traditional outlier detection techniques.  相似文献   

2.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

3.
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

4.
ABSTRACT

Nonparametric charts are useful in statistical process control when there is a lack of or limited knowledge about the underlying process distribution. Most existing approaches in the literature of Phase I monitoring assume that outliers have the same distributions as the in-control sample but only differ in location or scale parameters, they may not be effective with distributional changes. This article develops a new procedure based on the integration of the classical Anderson–Darling goodness-of-fit test and the stepwise isolation method. Our proposed procedure is efficient in detecting potential shifts in location, scale, or shape, and thus it offers robust protection against variation in various underlying distributions. The finite sample performance of our method is evaluated through simulations and is compared with that of available outlier detection methods for Phase I monitoring.  相似文献   

5.
In this paper, we propose a method for outlier detection and removal in electromyographic gait-related patterns (EMG-GRPs). The goal was to detect and remove EMG-GRPs that reduce the quality of gait data while preserving natural biological variations in EMG-GRPs. The proposed procedure consists of general statistical tests and is simple to use. The Friedman test with multiple comparisons was used to find particular EMG-GRPs that are extremely different from others. Next, outlying observations were calculated for each suspected stride waveform by applying the generalized extreme studentized deviate test. To complete the analysis, we applied different outlier criteria. The results suggest that an EMG-GRP is an outlier if it differs from at least 50% of the other stride waveforms and contains at least 20% of the outlying observations. The EMG signal remains a realistic representation of muscle activity and demonstrates step-by-step variability once the outliers, as defined here, are removed.  相似文献   

6.
This paper proposes a new heavy-tailed and alternative slash type distribution on a bounded interval via a relation of a slash random variable with respect to the standard logistic function to model the real data set with skewed and high kurtosis which includes the outlier observation. Some basic statistical properties of the newly defined distribution are studied. We derive the maximum likelihood, least-square, and weighted least-square estimations of its parameters. We assess the performance of the estimators of these estimation methods by the simulation study. Moreover, an application to real data demonstrates that the proposed distribution can provide a better fit than well-known bounded distributions in the literature when the skewed data set with high kurtosis contains the outlier observations.  相似文献   

7.
Regression analysis is one of methods widely used in prediction problems. Although there are many methods used for parameter estimation in regression analysis, ordinary least squares (OLS) technique is the most commonly used one among them. However, this technique is highly sensitive to outlier observation. Therefore, in literature, robust techniques are suggested when data set includes outlier observation. Besides, in prediction a problem, using the techniques that reduce the effectiveness of outlier and using the median as a target function rather than an error mean will be more successful in modeling these kinds of data. In this study, a new parameter estimation method using the median of absolute rate obtained by division of the difference between observation values and predicted values by the observation value and based on particle swarm optimization was proposed. The performance of the proposed method was evaluated with a simulation study by comparing it with OLS and some other robust methods in the literature.  相似文献   

8.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

9.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

10.
Whether an extreme observation is an outlier or not depends strongly on the corresponding tail behavior of the underlying distribution. We develop an automatic, data-driven method rooted in the mathematical theory of extremes to identify observations that deviate from the intermediate and central characteristics. The proposed algorithm is an extension of a method previously proposed in the literature for the specific case of heavy tailed Pareto-type distributions to all max-domains of attraction. We propose some applications such as a tail-adjusted boxplot which yields a more accurate representation of possible outliers, and the identification of outliers in a multivariate context through an analysis of associated random variables such as local outlier factors. Several examples and simulation results illustrate the finite sample behavior of the algorithm and its applications.  相似文献   

11.
Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration‐response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow‐up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice.  相似文献   

12.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

13.
Fox (1972), Box and Tiao (1975), and Abraham and Box (1979) have proposed methods for detecting outliers in time series whose ARMA form is known (or identified). We show that the existence of a single aberrant observation, innovation, or intervention causes an ARMA model to be misidentified using unadjusted autocorrelation (acf) and partial autocorrelation estimates. The magnitude, location, type of outlier, and in some cases the ARMA's parameters, affect the identification outcome. We use variance inflation, signal-to-noise ratios, and acf critical values to determine an ARMA model's susceptibility to misidentifi-cation. Numerical and simulation examples suggest how to iteratively use the outlier detection methods in practice.  相似文献   

14.
Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers.  相似文献   

15.
When there is an outlier in the data set, the efficiency of traditional methods decreases. In order to solve this problem, Kadilar et al. (2007) adapted Huber-M method which is only one of robust regression methods to ratio-type estimators and decreased the effect of outlier problem. In this study, new ratio-type estimators are proposed by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods based on the Kadilar et al. (2007). Theoretically, we obtain the mean square error (MSE) for these estimators. We compared with MSE values of proposed estimators and MSE values of estimators based on Huber-M and OLS methods. As a result of these comparisons, we observed that our proposed estimators give more efficient results than both Huber M approach which was proposed by Kadilar et al. (2007) and OLS approach. Also, under all conditions, all of the other proposed estimators except Lad method are more efficient than robust estimators proposed by Kadilar et al. (2007). And, these theoretical results are supported with the aid of a numerical example and simulation by basing on data that includes an outlier.  相似文献   

16.
In this article, we propose an outlier detection approach in a multiple regression model using the properties of a difference-based variance estimator. This type of a difference-based variance estimator was originally used to estimate error variance in a non parametric regression model without estimating a non parametric function. This article first employed a difference-based error variance estimator to study the outlier detection problem in a multiple regression model. Our approach uses the leave-one-out type method based on difference-based error variance. The existing outlier detection approaches using the leave-one-out approach are highly affected by other outliers, while ours is not because our approach does not use the regression coefficient estimator. We compared our approach with several existing methods using a simulation study, suggesting the outperformance of our approach. The advantages of our approach are demonstrated using a real data application. Our approach can be extended to the non parametric regression model for outlier detection.  相似文献   

17.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

18.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

19.

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.

  相似文献   

20.
SUMMARY In recent years, methods for dealing with autocorrelated data in the statistical process control environment have been proposed. A primary method is based on modeling the process data and applying control charts to the residuals. However, the residual charts do not have the same properties as the traditional charts. In the literature, there has been no systematic study on the detection capability of the residual chart for the stationary processes. The article develops a measure of the detection capability of the residual chart for the general stationary processes. Conditions under which the residual chart reduces or increases the detection capability are given. The relationships between the detection capability and the average run length of the residual chart are also established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号