首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

2.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

3.
The idea of searching for orthogonal projections, from a multidimensional space into a linear subspace, as an aid to detecting non-linear structure has been named exploratory projection pursuit.Most approaches are tied to the idea of searching for interesting projections. Typically, an interesting projection is one where the distribution of the projected data differs from the normal distribution. In this paper we define two projection indices which are aimed specifically at finding projections that best show grouped structure in the plane, if this exists in the multi-dimensional space. These involve a numerical optimization problem which is tackled in two stages, the projection and the pursuit; the first is based on a procedure to generate pseudo-random rotation matrices in the sense of the grand tour by D. Asimov (1985), and the second is a local numerical optimization procedure. One artificial and one real example illustrate the performance of the suggested indices.  相似文献   

4.
When there is an outlier in the data set, the efficiency of traditional methods decreases. In order to solve this problem, Kadilar et al. (2007) adapted Huber-M method which is only one of robust regression methods to ratio-type estimators and decreased the effect of outlier problem. In this study, new ratio-type estimators are proposed by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods based on the Kadilar et al. (2007). Theoretically, we obtain the mean square error (MSE) for these estimators. We compared with MSE values of proposed estimators and MSE values of estimators based on Huber-M and OLS methods. As a result of these comparisons, we observed that our proposed estimators give more efficient results than both Huber M approach which was proposed by Kadilar et al. (2007) and OLS approach. Also, under all conditions, all of the other proposed estimators except Lad method are more efficient than robust estimators proposed by Kadilar et al. (2007). And, these theoretical results are supported with the aid of a numerical example and simulation by basing on data that includes an outlier.  相似文献   

5.
Existing projection designs (e.g. maximum projection designs) attempt to achieve good space-filling properties in all projections. However, when using a Gaussian process (GP), model-based design criteria such as the entropy criterion is more appropriate. We employ the entropy criterion averaged over a set of projections, called expected entropy criterion (EEC), to generate projection designs. We show that maximum EEC designs are invariant to monotonic transformations of the response, i.e. they are optimal for a wide class of stochastic process models. We also demonstrate that transformation of each column of a Latin hypercube design (LHD) based on a monotonic function can substantially improve the EEC. Two types of input transformations are considered: a quantile function of a symmetric Beta distribution chosen to optimize the EEC, and a nonparametric transformation corresponding to the quantile function of a symmetric density chosen to optimize the EEC. Numerical studies show that the proposed transformations of the LHD are efficient and effective for building robust maximum EEC designs. These designs give projections with markedly higher entropies and lower maximum prediction variances (MPV''s) at the cost of small increases in average prediction variances (APV''s) compared to state-of-the-art space-filling designs over wide ranges of covariance parameter values.  相似文献   

6.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

7.
Regression analysis is one of methods widely used in prediction problems. Although there are many methods used for parameter estimation in regression analysis, ordinary least squares (OLS) technique is the most commonly used one among them. However, this technique is highly sensitive to outlier observation. Therefore, in literature, robust techniques are suggested when data set includes outlier observation. Besides, in prediction a problem, using the techniques that reduce the effectiveness of outlier and using the median as a target function rather than an error mean will be more successful in modeling these kinds of data. In this study, a new parameter estimation method using the median of absolute rate obtained by division of the difference between observation values and predicted values by the observation value and based on particle swarm optimization was proposed. The performance of the proposed method was evaluated with a simulation study by comparing it with OLS and some other robust methods in the literature.  相似文献   

8.
The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates.  相似文献   

9.
Multivariate outlier detection requires computation of robust distances to be compared with appropriate cut-off points. In this paper we propose a new calibration method for obtaining reliable cut-off points of distances derived from the MCD estimator of scatter. These cut-off points are based on a more accurate estimate of the extreme tail of the distribution of robust distances. We show that our procedure gives reliable tests of outlyingness in almost all situations of practical interest, provided that the sample size is not much smaller than 50. Therefore, it is a considerable improvement over all the available MCD procedures, which are unable to provide good control over the size of multiple outlier tests for the data structures considered in this paper.  相似文献   

10.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

11.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

12.
This work studies outlier detection and robust estimation with data that are naturally distributed into groups and which follow approximately a linear regression model with fixed group effects. For this, several methods are considered. First, the robust fitting method of Peña and Yohai [A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc. 1999;94:434–445], called principal sensitivity components (PSC) method, is adapted to the grouped data structure and the mentioned model. The robust methods RDL1 of Hubert and Rousseeuw [Robust regression with both continuous and binary regressors. J Stat Plan Inference. 1997;57:153–163] and M-S of Maronna and Yohai [Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 2000;89:197–214] are also considered. These three methods are compared in terms of their effectiveness in outlier detection and their robustness through simulations, considering several contamination scenarios and growing contamination levels. Results indicate that the adapted PSC procedure is able to detect a high percentage of true outliers and a small number of false outliers. It is appropriate when the contamination is in the error term or in the covariates, detecting also possibly masked high leverage points. Moreover, in simulations the final robust regression estimator preserved good efficiency under Normality while keeping good robustness properties.  相似文献   

13.
In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers.  相似文献   

14.
A nonparametric discriminant analysis procedure that is robust to deviations from the usual assumptions is proposed. The procedure uses the projection pursuit methodology where the projection index is the two-group transvariation probability. We use allocation based on the centrality of the new point measured using a smooth version of point-group transvariation. It is shown that the new procedure provides lower misclassification error rates than competing methods for data from skewed heavy-tailed and skewed distributions as well as unequal training data sizes.  相似文献   

15.
Cook距离公式常用于回归模型的异常值诊断,但由于公式中的样本方差■对异常值敏感,导致公式缺乏稳健性,使得诊断效果不理想。基于以上问题,文章选取绝对离差中位数作为样本标准差的稳健估计量,得到了样本方差■的稳健估计量,进而构造出稳健Cook距离公式;借鉴传统Cook距离的回归模型异常值诊断理论,将稳健Cook距离公式应用于时间序列异常值诊断,拓展了传统Cook距离公式的异常值诊断领域。通过选取模拟样本量分别为50、100、200,污染率分别为0、1%、5%、10%的ARMA(1,1)序列及金融时间序列进行实例分析,结果发现:(1)在无污染时,稳健Cook距离法与常规Cook距离法的诊断正确率均为100%,两者没有出现"误诊"现象;(2)在样本量、污染率同时增大时,常规Cook距离诊断正确率急剧下降,当污染率达到5%及以上时,已基本无诊断力,而稳健Cook距离法依然能保持较高的诊断力。稳健Cook距离法不仅能应用于时间序列异常值诊断,也能应用于回归分析的异常值诊断。  相似文献   

16.
Based on the projection depth weighted mean and scatter estimation of the joint distribution of (x, y), we introduce a robust estimator of the regression coefficients for the multivariate linear model. The new estimator possesses desirable properties including affine invariance, Fisher consistency, and asymptotic normality. Also, we study the robustness of the estimator in terms of breakdown point and influence function. Extensive simulation studies are performed to investigate the finite sample behavior of robustness and efficiency. The methodology is illustrated with a real data example.  相似文献   

17.
A novel projection pursuit method based on projecting the data onto itself is proposed. Using a number of real datasets it is shown how to obtain interesting one and two-dimensional projections using only O(n) evaluations of a one-dimensional projection index.  相似文献   

18.
Usual fitting methods for the nested error linear regression model are known to be very sensitive to the effect of even a single outlier. Robust approaches for the unbalanced nested error model with proved robustness and efficiency properties, such as M-estimators, are typically obtained through iterative algorithms. These algorithms are often computationally intensive and require robust estimates of the same parameters to start the algorithms, but so far no robust starting values have been proposed for this model. This paper proposes computationally fast robust estimators for the variance components under an unbalanced nested error model, based on a simple robustification of the fitting-of-constants method or Henderson method III. These estimators can be used as starting values for other iterative methods. Our simulations show that they are highly robust to various types of contamination of different magnitude.  相似文献   

19.
《Statistics》2012,46(6):1357-1385
ABSTRACT

The early stages of many real-life experiments involve a large number of factors among which only a few factors are active. Unfortunately, the optimal full-dimensional designs of those early stages may have bad low-dimensional projections and the experimenters do not know which factors turn out to be important before conducting the experiment. Therefore, designs with good projections are desirable for factor screening. In this regard, significant questions are arising such as whether the optimal full-dimensional designs have good projections onto low dimensions? How experimenters can measure the goodness of a full-dimensional design by focusing on all of its projections?, and are there linkages between the optimality of a full-dimensional design and the optimality of its projections? Through theoretical justifications, this paper tries to provide answers to these interesting questions by investigating the construction of optimal (average) projection designs for screening either nominal or quantitative factors. The main results show that: based on the aberration and orthogonality criteria the full-dimensional design is optimal if and only if it is optimal projection design; the full-dimensional design is optimal via the aberration and orthogonality if and only if it is uniform projection design; there is no guarantee that a uniform full-dimensional design is optimal projection design via any criterion; the projection design is optimal via the aberration, orthogonality and uniformity criteria if it is optimal via any criterion of them; and the saturated orthogonal designs have the same average projection performance.  相似文献   

20.
A method for robustness in linear models is to assume that there is a mixture of standard and outlier observations with a different error variance for each class. For generalised linear models (GLMs) the mixture model approach is more difficult as the error variance for many distributions has a fixed relationship to the mean. This model is extended to GLMs by changing the classes to one where the standard class is a standard GLM and the outlier class which is an overdispersed GLM achieved by including a random effect term in the linear predictor. The advantages of this method are it can be extended to any model with a linear predictor, and outlier observations can be easily identified. Using simulation the model is compared to an M-estimator, and found to have improved bias and coverage. The method is demonstrated on three examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号