首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The parametric and nonparametric methods for estimating the error rates in linear discriminant analysis are examined both in normal and in nonnormal situations. A Monte Carlo experiment was carried out under the assumption that two population distributions were characterized by a mixture of two multivariate normal distributions. The bootstrap bias-corrected apparent error rate compares favourably to other available estimators for nonnormal populations with small Mahalanobis distance. The methods for error estimation are also applied to a practical problem in medical diagnosis  相似文献   

2.
I consider the problem of estimating the Mahalanobis distance between multivariate normal populations when the population covariance matrix satisfies a graphical model. In addition to providing a clear understanding of the dependencies in a multivariate data set, the use of graphical models can reduce the variability of the estimated distances and improve inferences. I derive the asymptotic distribution of the estimated Mahalanobis distance under a general covariance model, which includes graphical models as a special case. Two examples are discussed.  相似文献   

3.
The empirical influence function for Mahalanobis distance and for misclassification rates are presented for discriminant analysis with two multivariate normal populations, following Campbell (1978). Conclusions about the effects of outliers from the empirical influence function are contrasted with exact calculations for four simple cases. These cases demonstrate that the higher-order terms discarded in deriving the empirical influence function can be important in practical problems.  相似文献   

4.
In a wide variety of biomedical and clinical research studies, sample statistics from diagnostic marker measurements are presented as a means of distinguishing between two populations, such as with and without disease. Intuitively, a larger difference between the mean values of a marker for the two populations, and a smaller spread of values within each population, should lead to more reliable classification rules based on this marker. We formalize this intuitive notion by deriving practical, new, closed-form expressions for the sensitivity and specificity of three different discriminant tests defined in terms of the sample means and standard deviations of diagnostic marker measurements. The three discriminant tests evaluated are based, respectively, on the Euclidean distance and the Mahalanobis distance between means, and a likelihood ratio analysis. Expressions for the effects of measurement error are also presented. Our final expressions assume that the diagnostic markers follow independent normal distributions for the two populations, although it will be clear that other known distributions may be similarly analyzed. We then discuss applications drawn from the medical literature, although the formalism is clearly not restricted to that application.  相似文献   

5.
In this paper the problem of selecting the best of several normal populations in terms of Mahalanobis distance (MD) whenpopulation variance-covariance matrices are equal and unknown is discussed. The selection rule enunciated is shown to ap-proximately satisfy the usual requirement of a minimum guaranteed probability of correct selection. Methods of computing tables required for application of the rule are discussed.  相似文献   

6.
In this paper, we generalize the notion of classification of an observation (sample), into one of the given n populations to the case where some or all of the populations into which the new observation is to be classified may be new but related in a simple way to the given n populations. The discussion is in the frame-work of the given set of observations obeying the usual multivariate general linear hypothesis model. The set ofpopulations into which the new observation may be classified could be linear manifolds of the parameter space or their closed subsets or closed convex subsets or a combination of them or simply t subsets of the parameter space each of which has a finite number of elements. In the last case alikelihood ratio procedure can be obtained easily. Classification procedures given here are based on Mahalanobis distance. Bonferroni lower bound estimate of the probability of correctly classifying an observation is given for the case when the covariance matrix is known or is estimated from a large sample. A numerical example relating to the classification procedures suggested her is given.  相似文献   

7.
The influence function introduced by Hampe1 (1968, 1973, 1974) is a tool that can be used for outlier detection. Campbell (1978) has obtained influence function for Mahalanobis’s distance between two populations which can be used for detecting outliers in discrim-inant analysis. In this paper influence functions for a variety of parametric functions in multivariate analysis are obtained. Influence functions for the generalized variance, the matrix of regression coefficients, the noncentrality matrix Σ-1 δ in multivariate analysis of variance and its eigen values, the matrix L, which is a generalization of 1-R2 , canonical correlations, principal components and parameters that correspond to Pillai’s statistic (1955), Hotelling’s (1951) generalized To2 and Wilk’s Λ (1932), which can be used for outlier detection in multivariate analysis, are obtained. Delvin, Ginanadesikan and Kettenring (1975) have obtained influence function for the population correlation co-efficient in the bivariate case. It is shown in this paper that influence functions for parameters corresponding to r2, R2, and Mahalanobis D2 can be obtained as particular cases.  相似文献   

8.
The necessary and sufficient conditions for the inadmissibility of the ridge regression is discussed under two different criteria, namely, average loss and Pitman nearness. Although the two criteria are very different, same conclusions are obtained. The loss functions considered in this article are th likelihood loss function and the Mahalanobis loss function. The two loss functions are motivated from the point of view of classification of two normal populations. Under the Mahalanobis loss it is demonstrated that the ridge regression is always inadmissible as long as the errors are assumed to be symmetrically distributed about the origin.  相似文献   

9.
A note on the Cook''s distance   总被引:1,自引:0,他引:1  
A modification of the classical Cook's distance is proposed, providing us with a generalized Mahalanobis distance in the context of multivariate elliptical linear regression models. We establish the exact distribution of a pivotal type statistics based on this generalized Mahalanobis distance, which provides critical points for the identification of outlier data points. Based on the equivalence between the modified Cook's distance and what is called the mean-shift multivariate outlier elliptical model, twelve new modifications are proposed for the Cook's distance. We also describe the explicit relationship between the Cook's distance and the likelihood displacement with the modified Cook's distance. We illustrate the procedure with some examples, in the context of multiple and multivariate linear regression.  相似文献   

10.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

11.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

12.
Classical univariate measures of asymmetry such as Pearson’s (mean-median)/σ or (mean-mode)/σ often measure the standardized distance between two separate location parameters and have been widely used in assessing univariate normality. Similarly, measures of univariate kurtosis are often just ratios of two scale measures. The classical standardized fourth moment and the ratio of the mean deviation to the standard deviation serve as examples. In this paper we consider tests of multinormality which are based on the Mahalanobis distance between two multivariate location vector estimates or on the (matrix) distance between two scatter matrix estimates, respectively. Asymptotic theory is developed to provide approximate null distributions as well as to consider asymptotic efficiencies. Limiting Pitman efficiencies for contiguous sequences of contaminated normal distributions are calculated and the efficiencies are compared to those of the classical tests by Mardia. Simulations are used to compare finite sample efficiencies. The theory is also illustrated by an example.  相似文献   

13.
在聚类问题中,若变量之间存在相关性,传统的应对方法主要是考虑采用马氏距离、主成分聚类等方法,但其可操作性或可解释性较差,因此提出一类基于模型的聚类方法,先对变量间的相关性结构建模(作为辅助信息)再做聚类分析。这种方法的优点主要在于:适用范围更宽泛,不仅能处理(线性)相关问题,而且还可以处理变量间存在的其他复杂结构生成的数据聚类问题;各个变量的重要性也可以通过模型的回归系数来体现;比马氏距离更稳健、更具操作性,比主成分聚类更容易得到解释,算法上也更为简洁有效。  相似文献   

14.
The traditional mixture model assumes that a dataset is composed of several populations of Gaussian distributions. In real life, however, data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or heavy-tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we generalize the mixture model using adaptive kernel density estimators. Because kernel density estimators enforce no functional form, we can adapt to non-normal asymmetric, kurtotic, and tail characteristics in each population independently. This, in effect, robustifies mixture modeling. We adapt two computational algorithms, genetic algorithm with regularized Mahalanobis distance and genetic expectation maximization algorithm, to optimize the kernel mixture model (KMM) and use results from robust estimation theory in order to data-adaptively regularize both. Finally, we likewise extend the information criterion ICOMP to score the KMM. We use these tools to simultaneously select the best mixture model and classify all observations without making any subjective decisions. The performance of the KMM is demonstrated on two medical datasets; in both cases, we recover the clinically determined group structure and substantially improve patient classification rates over the Gaussian mixture model.  相似文献   

15.
In this paper, sequential procedures for the surveillance of the covariance matrices of multivariate nonlinear time series are introduced. Two different types of control charts are proposed. The first type is based on the exponential smoothing of each component of a local measure for the covariances. The control statistic is equal to the Mahalanobis distance of this quantity with its in-control mean. In our second approach, the Mahalanobis distance is first determined and after that it is exponentially smoothed. We discuss three examples of local measures.

Several properties of the proposed schemes are discussed assuming the target process to be generated by a multivariate GARCH(1, 1) model. The generalization to the family of spherical distributions allows the modelling of frequently observed fat tails in financial data. Some results of an extensive Monte Carlo simulation study are provided in order to judge the performance of the presented control schemes. As a performance measure we use the average run length. An empirical example illustrates the importance of the fast detection of the changes in the covariance structure of the returns of financial assets.  相似文献   

16.
We propose optimal procedures to achieve the goal of partitioning k multivariate normal populations into two disjoint subsets with respect to a given standard vector. Definition of good or bad multivariate normal populations is given according to their Mahalanobis distances to a known standard vector as being small or large. Partitioning k multivariate normal populations is reduced to partitioning k non-central Chi-square or non-central F distributions with respect to the corresponding non-centrality parameters depending on whether the covariance matrices are known or unknown. The minimum required sample size for each population is determined to ensure that the probability of correct decision attains a certain level. An example is given to illustrate our procedures.  相似文献   

17.
Rong Zhu  Xinyu Zhang 《Statistics》2018,52(1):205-227
The theories and applications of model averaging have been developed comprehensively in the past two decades. In this paper, we consider model averaging for multivariate multiple regression models. In order to make use of the correlation information of the dependent variables sufficiently, we propose a model averaging method based on Mahalanobis distance which is related to the correlation of the dependent variables. We prove the asymptotic optimality of the resulting Mahalanobis Mallows model averaging (MMMA) estimators under certain assumptions. In the simulation study, we show that the proposed MMMA estimators compare favourably with model averaging estimators based on AIC and BIC weights and the Mallows model averaging estimators from the single dependent variable regression models. We further apply our method to the real data on urbanization rate and the proportion of non-agricultural population in ethnic minority areas of China.  相似文献   

18.
In this paper a dissimilarity index between statistical populations is proposed without the hypothesis of a specific statistical model. We assume that the studied populations differ on some relevant features which are measured through convenient parameters of interest. We assume also that we dispose of adequate estimators for these parameters. To measure the differences between populations with respect to the parameters of interest, we construct an index inspired on some properties of the information metric which are also presented. Additionally, we consider several examples and compare the obtained dissimilarity index with some other distances, like Mahalanobis or Siegel distances.  相似文献   

19.
In the paper we derive new types of multivariate exponentially weighted moving average (EWMA) control charts which are based on the Euclidean distance and on the distance defined by using the inverse of the diagonal matrix consisting of the variances. The design of the proposed control schemes does not involve the computation of the inverse covariance matrix and, thus, it can be used in the high-dimensional setting. The distributional properties of the control statistics are obtained and are used in the determination of the new control procedures. Within an extensive simulation study, the new approaches are compared with the multivariate EWMA control charts which are based on the Mahalanobis distance.  相似文献   

20.
Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper, we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback–Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations, we approximate these expectations on adequately designed grids, outperforming Monte Carlo integration with respect to computational time. For the sake of interpretability, we provide a baseline calibration for the proposed distance measures. We further develop similar substitutes for the Jeffreys distance, a symmetrized version of the Kullback–Leibler distance. In numerous examples and applications, we illustrate the strengths and weaknesses of the developed distance measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号