期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

How non-normality affects the quadratic discriminant function

William R. Clarke Peter A. Lachenbruch Barabara Broffitt 《统计学通讯:理论与方法》2013,42(13):1285-1301

The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability. 相似文献

2.

Distances between normal populations when covariance matrices are unequal

Pil S. Park Anant M. Kshirsagar 《统计学通讯:理论与方法》2013,42(12):3549-3556

The definition of distance between two populations of equal covariance matrices is extended to two and more than two populations with unequal covariance matrices and Rao’s U test for testing the conditional contribution of a subset of variables to the distance is extended to this situation, even when sample sizes are not necessarily the same. 相似文献

3.

The Euclidean distance classifier: an alternative to the linear discriminant function

Virgil R. Marco Dean M. Young Danny W. Turner 《统计学通讯:模拟与计算》2013,42(2):485-505

The sample linear discriminant function (LDF) is known to perform poorly when the number of features p is large relative to the size of the training samples, A simple and rarely applied alternative to the sample LDF is the sample Euclidean distance classifier (EDC). Raudys and Pikelis (1980) have compared the sample LDF with three other discriminant functions, including thesample EDC, when classifying individuals from two spherical normal populations. They have concluded that the sample EDC outperforms the sample LDF when p is large relative to the training sample size. This paper derives conditions for which the two classifiers are equivalent when all parameters are known and employs a Monte Carlo simulation to compare the sample EDC with the sample LDF no only for the spherical normal case but also for several nonspherical parameter configurations. Fo many practical situations, the sample EDC performs as well as or superior to the sample LDF, even for nonspherical covariance configurations. 相似文献

4.

A note on the linear discriminant function when group means are equal

Gregory T. Schwemer M Ray Mickey 《统计学通讯:模拟与计算》2013,42(6):633-638

The performance of the sample linear discriminant function with known, proportional, covariance matrices and equal but unknown mean vectors is considered. Unconditional misclassification rates are obtained from the Student-t distribution. These results can be used as an aid in verifying simulation programs incorporating the linear discriminant function when Gaussian densities with unequal covariance matrices are used. 相似文献

5.

Partially pooled covariance matrix estimation in discriminant analysis

Tom Greene William S. Rayens 《统计学通讯:理论与方法》2013,42(10):3679-3702

The Linear Discriminant Rule (LD) is theoretically justified for use in classification when the population within-groups covariance matrices are equal, something rarely known in practice. As an alternative, the Quadratic Discriminant Rule (QD) avoids assuming equal covariance matrices, but requires the estimation of a large number of parameters. Hence, the performance of QD may be poor if the training set sizes are small or moderate. In fact, simulation studies have shown that in the two-groups case LD often outperforms QD for small training sets even when the within -groups covariance matrices differ substantially. The present article shows this to be true when there are more than two groups, as well. Thus, it would seem reasonable and useful to develop a data-based method of classification that, in effect, represents a compromise between QD and LD. In this article we develop such a method based on an empirical Bayes formulation in which the within-groups covariance matrices are assumed to be outcomes of a common prior distribution whose parameters are estimated from the data. Two classification rules are developed under this framework and, through the use of extensive simulations, are compared to existing methods when the number of groups is moderate. 相似文献

6.

Sequential discrimination

Stephen C. Hora 《统计学通讯:理论与方法》2013,42(9):905-916

An analysis of the 1-stage classification decision with two candidate populations is provided in this paper. When the successive posterior probabilities follow a first order markov process it it shown that the optimal classification rules are greatly simplified. A detailed analysis and example are provided for the important case of multivariate normality with equal covariance matrices. 相似文献

7.

The rank transformation as a method of discrimination with some examples

W.J. Conover Ronald L. Iman 《统计学通讯:理论与方法》2013,42(5):465-487

The procedure of statistical discrimination Is simple in theory but so simple in practice. An observation x₀possibly uiultivariate, is to be classified into one of several populations π₁,…,π_k which have respectively, the density functions f₁(x), ? ? ? , f_k(x). The decision procedure is to evaluate each density function at X₀ to see which function gives the largest value f_i(X₀) , and then to declare that X₀ belongs to the population corresponding to the largest value. If these den-sities can be assumed to be normal with equal covariance matricesthen the decision procedure is known as Fisher’s linear discrimi-nant function (LDF) method. In the case of unequal covariance matrices the procedure is called the quadratic discriminant func-tion (QDF) method. If the densities cannot be assumed to be nor-mal then the LDF and QDF might not perform well. Several different procedures have appeared in the literature which offer discriminant procedures for nonnormal data. However, these pro-cedures are generally difficult to use and are not readily available as canned statistical programs.

Another approach to discriminant analysis is to use some sortof mathematical trans format ion on the samples so that their distribution function is approximately normal, and then use the convenient LDF and QDF methods. One transformation that:applies to all distributions equally well is the rank transformation. The result of this transformation is that a very simple and easy to use procedure is made available. This procedure is quite robust as is evidenced by comparisons of the rank transform results with several published simulation studies. 相似文献

8.

Regularized covariance matrix estimation under the common principal components model

P. T. Pepler D. W. Uys D. G. Nel 《统计学通讯:模拟与计算》2018,47(3):631-643

The common principal components (CPC) model provides a way to model the population covariance matrices of several groups by assuming a common eigenvector structure. When appropriate, this model can provide covariance matrix estimators of which the elements have smaller standard errors than when using either the pooled covariance matrix or the per group unbiased sample covariance matrix estimators. In this article, a regularized CPC estimator under the assumption of a common (or partially common) eigenvector structure in the populations is proposed. After estimation of the common eigenvectors using the Flury–Gautschi (or other) algorithm, the off-diagonal elements of the nearly diagonalized covariance matrices are shrunk towards zero and multiplied with the orthogonal common eigenvector matrix to obtain the regularized CPC covariance matrix estimates. The optimal shrinkage intensity per group can be estimated using cross-validation. The efficiency of these estimators compared to the pooled and unbiased estimators is investigated in a Monte Carlo simulation study, and the regularized CPC estimator is applied to a real dataset to demonstrate the utility of the method. 相似文献

9.

A hierarchical eigenmodel for pooled covariance estimation

Peter D. Hoff 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(5):971-992

Summary. Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, whereas the correlation between other pairs may be consistently negative. In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, which are the axes that are defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. For example, eigenvector matrices can be pooled to form a central set of principal axes and, to the extent that the axes are similar, covariance estimates for populations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population centre. To this end, the paper develops a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of 'centre' and 'spread' for a population of orthogonal matrices. 相似文献

10.

Bootstrapping analogs of the one way MANOVA test

Hasthika S. Rupasinghe Arachchige Don 《统计学通讯:理论与方法》2013,42(22):5546-5558

Abstract

Analogs of the classical one way MANOVA model have recently been suggested that do not assume that population covariance matrices are equal or that the error vector distribution is known. These tests are based on the sample mean and sample covariance matrix corresponding to each of the p populations. We show how to extend these tests using other measures of location such as the trimmed mean or coordinatewise median. These new bootstrap tests can have some outlier resistance, and can perform better than the tests based on the sample mean if the error vector distribution is heavy tailed. 相似文献

11.

Performance evaluation of likelihood-ratio tests for assessing similarity of the covariance matrices of two multivariate normal populations

D. Najarzadeh M. Ganjali 《统计学通讯:理论与方法》2013,42(5):1435-1452

Abstract

In analyzing two multivariate normal data sets, the assumption about equality of covariance matrices is usually used as a default for doing subsequence inferences. If this equality doesn’t hold, later inferences will be more complex and usually approximate. If one detects some identical components between two decomposed non equal covariance matrices and uses this extra information, one expects that subsequence inferences can be more accurately performed. For this purpose, in this article we consider some statistical tests about the equality of components of decomposed covariance matrices of two multivariate normal populations. Our emphasis is on the spectral decomposition of these matrices. Hypotheses about the equalities of sizes, shapes, and set of directions as components of these two covariance matrices are tested by the likelihood ratio test (LRT). Some simulation studies are carried out to investigate the accuracy and power of the LRT. Finally, analyses of two real data sets are illustrated. 相似文献

12.

Partitioning k multivariate normal populations according to equivalence with respect to a standard vector

Weixing Cai Pinyuen Chen 《Journal of statistical planning and inference》2009

We propose optimal procedures to achieve the goal of partitioning k multivariate normal populations into two disjoint subsets with respect to a given standard vector. Definition of good or bad multivariate normal populations is given according to their Mahalanobis distances to a known standard vector as being small or large. Partitioning k multivariate normal populations is reduced to partitioning k non-central Chi-square or non-central F distributions with respect to the corresponding non-centrality parameters depending on whether the covariance matrices are known or unknown. The minimum required sample size for each population is determined to ensure that the probability of correct decision attains a certain level. An example is given to illustrate our procedures. 相似文献

13.

The linear and euclidean discriminant functions: a comparison v1a asymptotic expansions and simulation study

J. P. Koolaard C. R. O. Lawoko 《统计学通讯:理论与方法》2013,42(12):2989-3011

This article considers the problem of statistical classification involving multivariate normal populations and compares the performance of the linear discriminant function (LDF) and the Euclidean distance function (EDF), Although the LDF is quite popular and robust, it has been established (Marco, Young and Turner, 1989) that under certain non-trivial conditions, the EDF is "equivalent" to the LDF, in terms of equal probabilities of misclassifica-tion (error rates). Thus it follows that under those conditions the sample EDF could perform better than the sample LDF, since the sample EDF involves estimation of fewer parameters. Sindation results, also from the above paper; seemed to support this hypothesis. This article compares the two sample discriminant functions through asymptotic expansions of error rates, and identifies situations when the sample EDF should perform better than the sample LDF. Results from simulation experiments are also reported and discussed. 相似文献

14.

Permutational tests for correlation matrices

W. J. Krzanowski 《Statistics and Computing》1993,3(1):37-44

Permutational tests are proposed for the hypotheses that two population correlation matrices have common eigenvectors, and that two population correlation matrices are equal. The only assumption made in these tests is that the distributional form is the same in the two populations; they should be useful as a prelude either to tests of mean differences in grouped standardised data or to principal component investigation of such data.The performance of the permutational tests is subjected to Monte Carlo investigation, and a comparison is made with the performance of the likelihood-ratio test for equality of covariance matrices applied to standardised data. Bootstrapping is considered as an alternative to permutation, but no particular advantages are found for it. The various tests are applied to several data sets. 相似文献

15.

Parametric and permutation testing for multivariate monotonic alternatives 总被引：1，自引：0，他引：1

Abouzar Bazyari Fortunato Pesarin 《Statistics and Computing》2013,23(5):639-652

We are firstly interested in testing the homogeneity of k mean vectors against two-sided restricted alternatives separately in multivariate normal distributions. This problem is a multivariate extension of Bartholomew (in Biometrica 46:328–335, 1959b) and an extension of Sasabuchi et al. (in Biometrica 70:465–472, 1983) and Kulatunga and Sasabuchi (in Mem. Fac. Sci., Kyushu Univ. Ser. A: Mathematica 38:151–161, 1984) to two-sided ordered hypotheses. We examine the problem of testing under two separate cases. One case is that covariance matrices are known, the other one is that covariance matrices are unknown but common. For the general case that covariance matrices are known the test statistic is obtained using the likelihood ratio method. When the known covariance matrices are common and diagonal, the null distribution of test statistic is derived and its critical values are computed at different significance levels. A Monte Carlo study is also presented to estimate the power of the test. A test statistic is proposed for the case when the common covariance matrices are unknown. Since it is difficult to compute the exact p-value for this problem of testing with the classical method when the covariance matrices are completely unknown, we first present a reformulation of the test statistic based on the orthogonal projections on the closed convex cones and then determine the upper bounds for its p-values. Also we provide a general nonparametric solution based on the permutation approach and nonparametric combination of dependent tests. 相似文献

16.

CLASSIFICATION AND PROPORTION ESTIMATION

J. Bélanger D. Gagnon 《Australian & New Zealand Journal of Statistics》1993,35(1):19-28

Assume that a number of individuals are to be classified into one of two populations and that, at the same time, the proportion of members of each population needs to be estimated. The allocated proportions given by the Bayes classification rule are not consistent estimates of the true proportions, so a different classification rule is proposed; this rule yields consistent estimates with only a small increase in the probability of misclassification. As an illustration, the case of two normal distributions with equal covariance matrices is dealt with in detail. 相似文献

17.

Distribution of multivariate quadratic forms under certain covariance structures

Robert J. Pavur 《Revue canadienne de statistique》1987,15(2):169-176

Necessary and sufficient conditions are given for the covariance structure of all the observations in a multivariate factorial experiment under which certain multivariate quadratic forms are independent and distributed as a constant times a Wishart. It is also shown that exact multivariate test statistics can be formed for certain covariance structures of the observations when the assumption of equal covariance matrices for each normal population is relaxed. A characterization is given for the dependency structure between random vectors in which the sample mean and sample covariance matrix have certain properties. 相似文献

18.

Effects of the generalized Box–Cox transformation on Type I error rate and power of Hotelling's T 2

《Journal of Statistical Computation and Simulation》2012,82(3):199-206

Most multivariate statistical techniques rely on the assumption of multivariate normality. The effects of nonnormality on multivariate tests are assumed to be negligible when variance–covariance matrices and sample sizes are equal. Therefore, in practice, investigators usually do not attempt to assess multivariate normality. In this simulation study, the effects of skewed and leptokurtic multivariate data on the Type I error and power of Hotelling's T ² were examined by manipulating distribution, sample size, and variance–covariance matrix. The empirical Type I error rate and power of Hotelling's T ² were calculated before and after the application of generalized Box–Cox transformation. The findings demonstrated that even when variance–covariance matrices and sample sizes are equal, small to moderate changes in power still can be observed. 相似文献

19.

Linear Discriminant Analysis of Multivariate Spatial–Temporal Regressions

J&#;RAT&#; &#;ALTYT&#;-BENTH K&#;STUTIS DU&#;INSKAS 《Scandinavian Journal of Statistics》2005,32(2):281-294

Abstract. We consider classification of the realization of a multivariate spatial–temporal Gaussian random field into one of two populations with different regression mean models and factorized covariance matrices. Unknown means and common feature vector covariance matrix are estimated from training samples with observations correlated in space and time, assuming spatial–temporal correlations to be known. We present the first-order asymptotic expansion of the expected error rate associated with a linear plug-in discriminant function. Our results are applied to ecological data collected from the Lithuanian Economic Zone in the Baltic Sea. 相似文献

20.

Linear discrimination for three known normal populations

Mark J. Schervish 《Journal of statistical planning and inference》1984,10(2):167-175

A random vector is assumed to have one of three known multivariate normal distributions with equal covariance matrices. It is desired to separate the three distributions by means of a single linear discriminant function. Such a function can lead to a classification rule. The function whose classification rule minimizes the average of the three probabilities of misclassification is found. Also the function is found whose rule minimizes the maximum of the three probabilities of misclassification. 相似文献