首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

2.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

3.
The influence function introduced by Hampe1 (1968, 1973, 1974) is a tool that can be used for outlier detection. Campbell (1978) has obtained influence function for Mahalanobis’s distance between two populations which can be used for detecting outliers in discrim-inant analysis. In this paper influence functions for a variety of parametric functions in multivariate analysis are obtained. Influence functions for the generalized variance, the matrix of regression coefficients, the noncentrality matrix Σ-1 δ in multivariate analysis of variance and its eigen values, the matrix L, which is a generalization of 1-R2 , canonical correlations, principal components and parameters that correspond to Pillai’s statistic (1955), Hotelling’s (1951) generalized To2 and Wilk’s Λ (1932), which can be used for outlier detection in multivariate analysis, are obtained. Delvin, Ginanadesikan and Kettenring (1975) have obtained influence function for the population correlation co-efficient in the bivariate case. It is shown in this paper that influence functions for parameters corresponding to r2, R2, and Mahalanobis D2 can be obtained as particular cases.  相似文献   

4.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

5.
Rong Zhu  Xinyu Zhang 《Statistics》2018,52(1):205-227
The theories and applications of model averaging have been developed comprehensively in the past two decades. In this paper, we consider model averaging for multivariate multiple regression models. In order to make use of the correlation information of the dependent variables sufficiently, we propose a model averaging method based on Mahalanobis distance which is related to the correlation of the dependent variables. We prove the asymptotic optimality of the resulting Mahalanobis Mallows model averaging (MMMA) estimators under certain assumptions. In the simulation study, we show that the proposed MMMA estimators compare favourably with model averaging estimators based on AIC and BIC weights and the Mallows model averaging estimators from the single dependent variable regression models. We further apply our method to the real data on urbanization rate and the proportion of non-agricultural population in ethnic minority areas of China.  相似文献   

6.
Asymptotic linearity plays a key role in estimation and testing in the presence of nuisance parameters. This property is established, in the very general context of a multivariate general linear model with elliptical VARMA errors, for the serial and nonserial multivariate rank statistics considered in Hallin and Paindaveine (Ann. Statist. 30 (2002a) 1103; Bernoulli 8 (2002b) 787 Ann. Statist. 32 (2004), to appear) and Oja and Paindaveine (J. Statist. Plann. Inference (2004), to appear). These statistics, which are multivariate versions of classical signed rank statistics, involve (i) multivariate signs based either on (pseudo-)Mahalanobis residuals, or on a modified version (absolute interdirections) of Randles's interdirections, and (ii) a concept of ranks based either on (pseudo-)Mahalanobis distances or on lift-interdirections.  相似文献   

7.
Birnbaum–Saunders (BS) models are receiving considerable attention in the literature. Multivariate regression models are a useful tool of the multivariate analysis, which takes into account the correlation between variables. Diagnostic analysis is an important aspect to be considered in the statistical modeling. In this paper, we formulate multivariate generalized BS regression models and carry out a diagnostic analysis for these models. We consider the Mahalanobis distance as a global influence measure to detect multivariate outliers and use it for evaluating the adequacy of the distributional assumption. We also consider the local influence approach and study how a perturbation may impact on the estimation of model parameters. We implement the obtained results in the R software, which are illustrated with real-world multivariate data to show their potential applications.  相似文献   

8.
We propose to use the term standard distance for the quantity in univariate analysis and show that it can be easily generalized to the multivariate situation, where it coincides with the square root of the Mahalanobis distance between two samples.  相似文献   

9.
This paper treats the problem of estimating the Mahalanobis distance for the purpose of detecting outliers in high-dimensional data. Three ridge-type estimators are proposed and risk functions for deciding an appropriate value of the ridge coefficient are developed. It is argued that one of the ridge estimator has particularly tractable properties, which is demonstrated through outlier analysis of real and simulated data.  相似文献   

10.
Mardia's multivariate kurtosis and the generalized distance have desirable properties as multivariate outlier tests. However, extensive critical values have not been published heretofore. A published approximation formula for critical values of the kurtosis is shown to inadequately control the type I error rate, with observed error rates often differing from their intended values by a factor of two or more. Critical values derived from simulations for both tests for up to 25 dimensions and 500 observations are presented. The power curves of both tests are discussed. The generalized distance is the more powerful test when exactly one outlier is present and the contaminant is substantially mean-shifted. However, as the number of outliers increases, the kurtosis becomes the more powerful test. The two tests are compared with respect to power and vulnerability to masking. Recommendations for the use of these tests and interpretation of results are given.  相似文献   

11.
The stalactite plot for the detection of multivariate outliers   总被引:1,自引:0,他引:1  
Detection of multiple outliers in multivariate data using Mahalanobis distances requires robust estimates of the means and covariance of the data. We obtain this by sequential construction of an outlier free subset of the data, starting from a small random subset. The stalactite plot provides a cogent summary of suspected outliers as the subset size increases. The dependence on subset size can be virtually removed by a simulation-based normalization. Combined with probability plots and resampling procedures, the stalactite plot, particularly in its normalized form, leads to identification of multivariate outliers, even in the presence of appreciable masking.  相似文献   

12.
I consider the problem of estimating the Mahalanobis distance between multivariate normal populations when the population covariance matrix satisfies a graphical model. In addition to providing a clear understanding of the dependencies in a multivariate data set, the use of graphical models can reduce the variability of the estimated distances and improve inferences. I derive the asymptotic distribution of the estimated Mahalanobis distance under a general covariance model, which includes graphical models as a special case. Two examples are discussed.  相似文献   

13.
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

14.
ABSTRACT

Cylindrical data are bivariate data from the combination of circular and linear variables. However, up to now no work has been done on the detection of outlier in cylindrical data. We introduce a definition of outlier for cylindrical data and present a new test of discordancy to detect outlier in this type of data, based on the k-nearest neighbor’s distance. Cut-off points of the new test statistic based on the Johnson-Wehrly distribution are calculated and its performance is examined using simulation. A practical example is presented using wind speed and wind direction data obtained from the Malaysian Meteorological Department.  相似文献   

15.
Multivariate control charts are powerful and simple visual tools for monitoring the quality of a process. This multivariate monitoring is carried out by considering simultaneously several correlated quality characteristics and by determining whether these characteristics are in control or out of control. In this paper, we propose a robust methodology using multivariate quality control charts for subgroups based on generalized Birnbaum–Saunders distributions and an adapted Hotelling statistic. This methodology is constructed for Phases I and II of control charts. We estimate the corresponding parameters with the maximum likelihood method and use parametric bootstrapping to obtain the distribution of the adapted Hotelling statistic. In addition, we consider the Mahalanobis distance to detect multivariate outliers and use it to assess the adequacy of the distributional assumption. A Monte Carlo simulation study is conducted to evaluate the proposed methodology and to compare it with a standard methodology. This study reports the good performance of our methodology. An illustration with real-world air quality data of Santiago, Chile, is provided. This illustration shows that the methodology is useful for alerting early episodes of extreme air pollution, thus preventing adverse effects on human health.  相似文献   

16.
The Liu estimator has been developed as an alternative to the ordinary least squares estimator in the presence of collinearity among the elements of regressors in linear regression models. We present the DFFITS and different versions of the Cook distance analogous to the ones given for the ordinary linear regression models of each individual observation on the Liu estimates. We suggest a version of the Cook distance based on one-step approximation. The mean shift outlier model for the Liu regression has also been investigated. Moreover, using the Sherman-Morrison-Woodbury theorem, we find approximate versions of the DFFITS and the Cook distance. The proposed diagnostics are evaluated on two data sets and yield promising results.  相似文献   

17.
ABSTRACT

This note presents an approximation to multivariate regression models which is obtained from a first-order series expansion of the multivariate link function. The proposed approach yields a variable-addition approximation of regression models that enables a multivariate generalization of the well-known goodness-of-link specification test, available for univariate generalized linear models. Application of this general methodology is illustrated with models of multinomial discrete choice and multivariate fractional data, in which context it is shown to lead to well-established approximation and testing procedures.  相似文献   

18.
In the linear regression model with elliptical errors, a shrinkage ridge estimator is proposed. In this regard, the restricted ridge regression estimator under sub-space restriction is improved by incorporating a general function which satisfies Taylor’s series expansion. Approximate quadratic risk function of the proposed shrinkage ridge estimator is evaluated in the elliptical regression model. A Monte Carlo simulation study and analysis based on a real data example are considered for performance analysis. It is evident from the numerical results that the shrinkage ridge estimator performs better than both unrestricted and restricted estimators in the multivariate t-regression model, for some specific cases.  相似文献   

19.
In this article, we present a framework of estimating patterned covariance of interest in the multivariate linear models. The main idea in it is to estimate a patterned covariance by minimizing a trace distance function between outer product of residuals and its expected value. The proposed framework can provide us explicit estimators, called outer product least-squares estimators, for parameters in the patterned covariance of the multivariate linear model without or with restrictions on regression coefficients. The outer product least-squares estimators enjoy the desired properties in finite and large samples, including unbiasedness, invariance, consistency and asymptotic normality. We still apply the framework to three special situations where their patterned covariances are the uniform correlation, a generalized uniform correlation and a general q-dependence structure, respectively. Simulation studies for three special cases illustrate that the proposed method is a competent alternative of the maximum likelihood method in finite size samples.  相似文献   

20.
In this paper, we obtain an adjusted version of the likelihood ratio (LR) test for errors-in-variables multivariate linear regression models. The error terms are allowed to follow a multivariate distribution in the class of the elliptical distributions, which has the multivariate normal distribution as a special case. We derive a modified LR statistic that follows a chi-squared distribution with a high degree of accuracy. Our results generalize those in Melo and Ferrari (Advances in Statistical Analysis, 2010, 94, pp. 75–87) by allowing the parameter of interest to be vector-valued in the multivariate errors-in-variables model. We report a simulation study which shows that the proposed test displays superior finite sample behavior relative to the standard LR test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号