首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

2.
The influence function introduced by Hampel (1968, 1973, 1974) i s a tool that can be used for outlier detection. Campbell (1978) has derived influence function for ~ahalanobis's distance between two populations which can be used for detecting outliers i n discriminant analysis. Radhakrishnan and Kshirsagar (1981) have obtained influence functions for a variety of parametric functions i n multivariate analysis. Radhakrishnan (1983) obtained influence functions for parameters corresponding to "residual" wilks's A and i t s "direction" and "collinearity" factors i n discriminant analysis when a single discriminant function is ade- quate while discriminating among several groups. In this paper influence functions for parameters that correspond to "residual" wilks's A and its "direction" and "coplanarity" factors used to test the goodness of f i t of s (s>l) assigned discriminant func- tions for discriminating among several groups are obtained. These influence functions can be used for outlier detection i n m u l t i -variate data when a single discriminant function is not adequate.  相似文献   

3.
The unique minimum variance of unbiased estimator is obtained for analysis functions of the mean of a multivariate normal distribution with either unknown covariance matrix or with covariance matrix of the form σ2v where σ2 is unknown.  相似文献   

4.
Most multivariate statistical techniques rely on the assumption of multivariate normality. The effects of nonnormality on multivariate tests are assumed to be negligible when variance–covariance matrices and sample sizes are equal. Therefore, in practice, investigators usually do not attempt to assess multivariate normality. In this simulation study, the effects of skewed and leptokurtic multivariate data on the Type I error and power of Hotelling's T 2 were examined by manipulating distribution, sample size, and variance–covariance matrix. The empirical Type I error rate and power of Hotelling's T 2 were calculated before and after the application of generalized Box–Cox transformation. The findings demonstrated that even when variance–covariance matrices and sample sizes are equal, small to moderate changes in power still can be observed.  相似文献   

5.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

6.
A note on the Cook''s distance   总被引:1,自引:0,他引:1  
A modification of the classical Cook's distance is proposed, providing us with a generalized Mahalanobis distance in the context of multivariate elliptical linear regression models. We establish the exact distribution of a pivotal type statistics based on this generalized Mahalanobis distance, which provides critical points for the identification of outlier data points. Based on the equivalence between the modified Cook's distance and what is called the mean-shift multivariate outlier elliptical model, twelve new modifications are proposed for the Cook's distance. We also describe the explicit relationship between the Cook's distance and the likelihood displacement with the modified Cook's distance. We illustrate the procedure with some examples, in the context of multiple and multivariate linear regression.  相似文献   

7.
Hotelling's T2 statistic has many applications in multivariate analysis. In particular, it can be used to measure the influence that a particular observation vector has on parameter estimation. For example, in the bivariate case, there exists a direct relationship between the ellipse generated using a T2 statistic for individual observations and the hyperbolae generated using Hampel's influence function for the corresponding correlation coefficient. In this paper, we jointly use the components of an orthogonal decomposition of the T2 statistic and some influence functions to identify outliers or influential observations. Since the conditional components in the T2 statistic are related to the possible changes in the correlation between a variable and a group of other variables, we consider the theoretical influence functions of the correlations and multiple correlation coefficients. Finite-sample versions of these influence functions are used to find the estimated influence function values.  相似文献   

8.
Birnbaum–Saunders (BS) models are receiving considerable attention in the literature. Multivariate regression models are a useful tool of the multivariate analysis, which takes into account the correlation between variables. Diagnostic analysis is an important aspect to be considered in the statistical modeling. In this paper, we formulate multivariate generalized BS regression models and carry out a diagnostic analysis for these models. We consider the Mahalanobis distance as a global influence measure to detect multivariate outliers and use it for evaluating the adequacy of the distributional assumption. We also consider the local influence approach and study how a perturbation may impact on the estimation of model parameters. We implement the obtained results in the R software, which are illustrated with real-world multivariate data to show their potential applications.  相似文献   

9.
Chapter Notes     
Tests for redundancy of variables in linear two-group discriminant analysis are well known and frequently used. We give a survey of similar tests, including the one-sample T 2 as a special case, in the situation in which only the mean vector (but no covariance matrix) is available in one sample. Then we show that a relation between linear regression and discriminant functions found by Fisher (1936) can be generalized to this situation. Relating regression and discriminant analysis to a multivariate linear model sheds more light on the relationship between them. Practical and didactical advantages of the regression approach to T 2 tests and discriminant analysis are outlined.  相似文献   

10.
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

11.
The present paper deals with sensitivity analysis in maximum likelihood factor analysis. To investigate the influence of a small change of data we derive theoretical influence functions I(x; LLT ) and I(x; Δ) for a common variance matrix T= LLT and a unique variance matrix Δ respectively. Numerical examples are shown to illustrate our procedure.  相似文献   

12.
In this paper we consider the problem of testing the means of k multivariate normal populations with additional data from an unknown subset of the k populations. The purpose of this research is to offer test procedures utilizing all the available data for the multivariate analysis of variance problem because the additional data may contain valuable information about the parameters of the k populations. The standard procedure uses only the data from identified populations. We provide a test using all available data based upon Hotelling' s generalized T2statistic. The power of this test is computed using Betz's approximation of Hotelling' s generalized T2statistic by an F-distribution. A comparison of the power of the test and the standard test procedure is also given.  相似文献   

13.
The signal issued by a control chart triggers the process professionals to investigate the special cause. Change point methods simplify the efforts to search for and identify the special cause. In this study, using maximum likelihood estimation, a multivariate joint change point estimation procedure for monitoring both location and dispersion simultaneously is proposed. After a signal is generated by the simultaneously used Hotelling's T 2 and/or generalized variance control charts, the procedure starts detecting the time of the change. The performance of the proposed method for several structural changes for the mean vector and covariance matrix is discussed.  相似文献   

14.
This article proposes a multivariate synthetic control chart for skewed populations based on the weighted standard deviation method. The proposed chart incorporates the weighted standard deviation method into the standard multivariate synthetic control chart. The standard multivariate synthetic chart consists of the Hotelling's T 2 chart and the conforming run length chart. The weighted standard deviation method adjusts the variance–covariance matrix of the quality characteristics and approximates the probability density function using several multivariate normal distributions. The proposed chart reduces to the standard multivariate synthetic chart when the underlying distribution is symmetric. In general, the simulation results show that the proposed chart performs better than the existing multivariate charts for skewed populations and the standard T 2 chart, in terms of false alarm rates as well as moderate and large mean shift detection rates based on the various degrees of skewnesses.  相似文献   

15.
A series expansion is obtained for the confluent hypergeometric function of the second kind when the argument is a 2 times 2 positive definite matrix. Applications are made to the distributions of Hotelling's generalized T02 statistic, and the smallest latent root of the covariance matrix.  相似文献   

16.
Given p×n X N(βY, ∑?I), β, ∑ unknown, the noncentral multivariate beta density of the matrix L = [(YY′)-1/2Y X′ (XX′)-1XY′ (YY′)-1/2] is desired. Khatri (1964) finds this density when β is of rank unity. The present paper derives the noncentral density of L and the density of the roots matrix of L for full rank β. The dual case density of L is also obtained. The derivations are based on generalized Sverdrup's lemma, Kabe (1965), and the relationship between primal and dual density of L is explicitly established.  相似文献   

17.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

18.
The empirical influence function for Mahalanobis distance and for misclassification rates are presented for discriminant analysis with two multivariate normal populations, following Campbell (1978). Conclusions about the effects of outliers from the empirical influence function are contrasted with exact calculations for four simple cases. These cases demonstrate that the higher-order terms discarded in deriving the empirical influence function can be important in practical problems.  相似文献   

19.
The authors consider a robust linear discriminant function based on high breakdown location and covariance matrix estimators. They derive influence functions for the estimators of the parameters of the discriminant function and for the associated classification error. The most B‐robust estimator is determined within the class of multivariate S‐estimators. This estimator, which minimizes the maximal influence that an outlier can have on the classification error, is also the most B‐robust location S‐estimator. A comparison of the most B‐robust estimator with the more familiar biweight S‐estimator is made.  相似文献   

20.
For multivariate regression with a symmetric disturbance distribution, the error in the least absolute residuals estimator is approximately multivariate normally distributed with mean zero and variance matrix λ2(X′X)?1, where X is the matrix of K explanatory variables and T observations, and λ 2/T is the variance of the median of a sample of size T from the disturbance distribution. The approximate sampling theory is validated by extensive Monte Carlo studies, and some directions of possible refinement emerge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号