首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, several new robust multivariate estimators of location and scatter have been proposed that provide new and improved methods for detecting multivariate outliers. But for small sample sizes, there are no results on how these new multivariate outlier detection techniques compare in terms of p n , their outside rate per observation (the expected proportion of points declared outliers) under normality. And there are no results comparing their ability to detect truly unusual points based on the model that generated the data. Moreover, there are no results comparing these methods to two fairly new techniques that do not rely on some robust covariance matrix. It is found that for an approach based on the orthogonal Gnanadesikan–Kettenring estimator, p n can be very unsatisfactory with small sample sizes, but a simple modification gives much more satisfactory results. Similar problems were found when using the median ball algorithm, but a modification proved to be unsatisfactory. The translated-biweights (TBS) estimator generally performs well with a sample size of n≥20 and when dealing with p-variate data where p≤5. But with p=8 it can be unsatisfactory, even with n=200. A projection method as well the minimum generalized variance method generally perform best, but with p≤5 conditions where the TBS method is preferable are described. In terms of detecting truly unusual points, the methods can differ substantially depending on where the outliers happen to be, the number of outliers present, and the correlations among the variables.  相似文献   

2.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

3.
In the past decade, different robust estimators have been proposed by several researchers to improve the ability to detect non-random patterns such as trend, process mean shift, and outliers in multivariate control charts. However, the use of the sample mean vector and the mean square successive difference matrix in the T 2 control chart is sensitive in detecting process mean shift or trend but less sensitive in detecting outliers. On the other hand, the minimum volume ellipsoid (MVE) estimators in the T 2 control chart are sensitive in detecting multiple outliers but less sensitive in detecting trend or process mean shift. Therefore, new robust estimators using both merits of the mean square successive difference matrix and the MVE estimators are developed to modify Hotelling's T 2 control chart. To compare the detection performance among various control charts, a simulation approach for establishing control limits and calculating signal probabilities is provided as well. Our simulation results show that a multivariate control chart using the new robust estimators can achieve a well-balanced sensitivity in detecting the above-mentioned non-random patterns. Finally, three numerical examples further demonstrate the usefulness of our new robust estimators.  相似文献   

4.
The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.  相似文献   

5.
The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

6.
The local influence method introduced by Cook is adapted to multivariate normal data for the purpose of detecting outliers. The method allows simultaneous perturbations on all observations, so that it can identify multiple outliers. An illustrative example is given to show the e ectiveness of the method for the identification of influential observations.  相似文献   

7.
由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

8.
Response surfaces express the behavior of responses and can be used for both single and multi-response problems. A common approach to estimate a response surface using experimental results is the ordinary least squares (OLS) method. Since OLS is very sensitive to outliers, some robust approaches have been discussed in the literature. Although there are many methods available in the literature for multiple response optimizations, there are a few studies in model building especially robust models. Assuming correlated responses, in this paper, a robust coefficient estimation method is proposed for multi response problem based on M-estimators. In order to illustrate the performance of the proposed procedure, a contaminated experimental design using a numerical example available in the literature with some modifications is used. Both the classical multivariate least squares method and the proposed robust multivariate approach are used to estimate regression coefficients of multi-response surfaces based on this example. Moreover, a comparison of the proposed robust multi response surface (RMRS) approach with separate robust estimation of single response show that the proposed approach is more efficient.  相似文献   

9.
In this article, we propose a new test of discordancy based on spacing theory in circular data. The test should provide a good alternative to existing tests of discordancy for detecting single or well-separated multiple outliers. On top of that, the new method can be generalized to identify a patch of outliers in data. The percentage points are calculated and the performance is examined. We first investigate the performance of the test for detecting a single outlier and show that the new test performs well compared to other known tests. We then show that the generalized test works well in detecting a patch of outliers in the data. As an illustration, a practical example based on an eye dataset obtained from a glaucoma clinic at the University of Malaya Medical Center, Malaysia is presented.  相似文献   

10.
时间序列自回归AR模型在建模过程中易受离群值的影响,导致计算结果与实际不相符。针对这一现象,运用FQn统计量对传统自相关函数进行改进,构建出自回归AR模型的稳健估计算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析均表明:当时序数据中不存在离群值时,传统估计方法与稳健估计方法得到的结果基本保持一致;当数据中存在离群值时,运用传统估计方法得到的结果出现较大变化,而运用稳健估计方法得到的结果基本不变.这说明相对于传统估计方法,稳健估计方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

11.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

12.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

13.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability.  相似文献   

14.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

15.
The stalactite plot for the detection of multivariate outliers   总被引:1,自引:0,他引:1  
Detection of multiple outliers in multivariate data using Mahalanobis distances requires robust estimates of the means and covariance of the data. We obtain this by sequential construction of an outlier free subset of the data, starting from a small random subset. The stalactite plot provides a cogent summary of suspected outliers as the subset size increases. The dependence on subset size can be virtually removed by a simulation-based normalization. Combined with probability plots and resampling procedures, the stalactite plot, particularly in its normalized form, leads to identification of multivariate outliers, even in the presence of appreciable masking.  相似文献   

16.
Some recent contributions to robust data analysis and multiple outlier detection are discussed. Two methods of analysis producing robust estimates and sets of weights which may be inspected for outliers are described and compared. Some examples of their application are given to support the recommendation that both ordinary least squares and a robust method of analysis should be part of routine data analysis.  相似文献   

17.
A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.  相似文献   

18.
王斌会 《统计研究》2007,24(8):72-76
传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。  相似文献   

19.
We propose a new regression-based filter for extracting signals online from multivariate high frequency time series. It separates relevant signals of several variables from noise and (multivariate) outliers.

Unlike parallel univariate filters, the new procedure takes into account the local covariance structure between the single time series components. It is based on high-breakdown estimates, which makes it robust against (patches of) outliers in one or several of the components as well as against outliers with respect to the multivariate covariance structure. Moreover, the trade-off problem between bias and variance for the optimal choice of the window width is approached by choosing the size of the window adaptively, depending on the current data situation.

Furthermore, we present an advanced algorithm of our filtering procedure that includes the replacement of missing observations in real time. Thus, the new procedure can be applied in online-monitoring practice. Applications to physiological time series from intensive care show the practical effect of the proposed filtering technique.  相似文献   

20.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号