首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article considers the problem of statistical classification involving multivariate normal populations and compares the performance of the linear discriminant function (LDF) and the Euclidean distance function (EDF), Although the LDF is quite popular and robust, it has been established (Marco, Young and Turner, 1989) that under certain non-trivial conditions, the EDF is "equivalent" to the LDF, in terms of equal probabilities of misclassifica-tion (error rates). Thus it follows that under those conditions the sample EDF could perform better than the sample LDF, since the sample EDF involves estimation of fewer parameters. Sindation results, also from the above paper; seemed to support this hypothesis. This article compares the two sample discriminant functions through asymptotic expansions of error rates, and identifies situations when the sample EDF should perform better than the sample LDF. Results from simulation experiments are also reported and discussed.  相似文献   

2.
The linear discriminant function (LDF) is known to be optimal in the sense of achieving an optimal error rate when sampling from multivariate normal populations with equal covariance matrices. Use of the LDF in nonnormal situations is known to lead to some strange results. This paper will focus on an evaluation of misclassification probabilities when the power transformation could have been used to achieve at least approximate normality and equal covariance matrices in the sampled populations for the distribution of the observed random variables. Attention is restricted to the two-population case with bivariate distributions.  相似文献   

3.
The purpose of thls paper is to investlgate the performance of the LDF (linear discrlmlnant functlon) and QDF (quadratic dlscrminant functlon) for classlfylng observations from the three types of univariate and multivariate non-normal dlstrlbutlons on the basls of the mlsclasslficatlon rate. The theoretical and the empirical results are described for unlvariate distributions, and the empirical results are presented for multivariate distributions. It 1s also shown that the sign of the skewness of each population and the kurtosis have essential effects on the performance of the two discriminant functions. The variations of the populatlon speclflc mlsclasslflcatlon rates are greatly depend on the sample slze. For the large dlmenslonal populatlon dlstributlons, if the sample sizes are sufflclent, the QDF performs better than the LDF. We show the crlterla of a cholce between the two discriminant functions as an application.  相似文献   

4.
This paper considers the problem where the linear discriminant rule is formed from training data that are only partially classified with respect to the two groups of origin. A further complication is that the data of unknown origin do not constitute an observed random sample from a mixture of the two under- lying groups. Under the assumption of a homoscedastic normal model, the overall error rate of the sample linear discriminant rule formed by maximum likelihood from the partially classified training data is derived up to and including terms of the first order in the case of univariate feature data. This first- order expansion of the sample rule so formed is used to define its asymptotic efficiency relative to the rule formed from a completely classified random training set and also to the rule formed from a completely unclassified random set.  相似文献   

5.
The procedure of statistical discrimination Is simple in theory but so simple in practice. An observation x0possibly uiultivariate, is to be classified into one of several populations π1,…,πk which have respectively, the density functions f1(x), ? ? ? , fk(x). The decision procedure is to evaluate each density function at X0 to see which function gives the largest value fi(X0) , and then to declare that X0 belongs to the population corresponding to the largest value. If these den-sities can be assumed to be normal with equal covariance matricesthen the decision procedure is known as Fisher’s linear discrimi-nant function (LDF) method. In the case of unequal covariance matrices the procedure is called the quadratic discriminant func-tion (QDF) method. If the densities cannot be assumed to be nor-mal then the LDF and QDF might not perform well. Several different procedures have appeared in the literature which offer discriminant procedures for nonnormal data. However, these pro-cedures are generally difficult to use and are not readily available as canned statistical programs.

Another approach to discriminant analysis is to use some sortof mathematical trans format ion on the samples so that their distribution function is approximately normal, and then use the convenient LDF and QDF methods. One transformation that:applies to all distributions equally well is the rank transformation. The result of this transformation is that a very simple and easy to use procedure is made available. This procedure is quite robust as is evidenced by comparisons of the rank transform results with several published simulation studies.  相似文献   

6.
The theory of acceptance sampling by variables is well known when the underlying distribution is normal. When the normality assumption is not true, using the usual normal case method can be quite misleading. In this paper we deal with the Laplace distribution for both the standard deviation known and then unknown. We establish a decision rule for accepting a lot of product containing a defective proportion p. We determine the density function of the decision rule statistic, for small and large sample sizes. We give some practical ways to choose the sample size and the acceptance constant to obtain a desired operating characteristic curve  相似文献   

7.
We derive sample size formulas for the many-one test of Steel (1959) when the all-pairs power is preassigned. In this large sample approach we replace, similar to Noether (1987), the unknown variances and also the unknown correlation coefficients in the power expressions by their known values under the null hypotheses. We then obtain least favorable configurations for one-and two-sided comparisons. The reliability of our formulas is examined in computer simulations for different alternatives with various distributions.  相似文献   

8.
A unit ω is to be classified into one of two correlated homoskedastic normal populations by linear discriminant function known as W classification statistic [T.W. Anderson, An asymptotic expansion of the distribution of studentized classification statistic, Ann. Statist. 1 (1973), pp. 964–972; T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd edn, Wiley, New York, 1984; G.J. Mclachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley and Sons, New York, 1992]. The two populations studied here are two different states of the same population, like two different states of a disease where the population is the population of diseased patient. When a sample unit is observed in both the states (populations), the observations made on it (which form a pair) become correlated. A training sample is unbalanced when not all sample units are observed in both the states. Paired and also unbalanced samples are natural in studies related to correlated populations. S. Bandyopadhyay and S. Bandyopadhyay [Choosing better training sample for classifying an individual into one of two correlated normal populations, Calcutta Statist. Assoc. Bull. 54(215–216) (2003), pp. 167–180] studied the effect of unbalanced training sample structure on the performance of W statistics in the univariate correlated normal set-up for finding optimal sampling strategy for a better classification rate. In this study, the results are extended to the multivariate case with discussion on application in real scenario.  相似文献   

9.
Using the techniques developed by Subrahmaniam and Ching’anda (1978), we study the robustness to nonnormality of the linear discriminant functions. It is seen that the LDF procedure is quite robust against the likelihood ratio rule. The latter yields in all cases much smaller overall error rates; however, the disparity between the error rates of the LDF and LR procedures is not large enough to warrant the recommendation to use the more complicated LR procedure.  相似文献   

10.
This article enlarges the covariance configurations, on which the classical linear discriminant analysis is based, by considering the four models arising from the spectral decomposition when eigenvalues and/or eigenvectors matrices are allowed to vary or not between groups. As in the classical approach, the assessment of these configurations is accomplished via a test on the training set. The discrimination rule is then built upon the configuration provided by the test, considering or not the unlabeled data. Numerical experiments, on simulated and real data, have been performed to evaluate the gain of our proposal with respect to the linear discriminant analysis.  相似文献   

11.
Classification procedures are examined in the case when the dimensionality exceeds the sample size. Two particular suggestions are (i) Principal components analysis and (ii) Two-step discriminant analysis. Comparisons are made in the two sample and the several sample cases. Extensions to growth curve model are investigated using the two stage discriminant analysis.  相似文献   

12.
The plug–in Anderson's covariate classification statistic is constructed on the basis of an initially unclassified training sample by means of posty–stratification. The asymptotic efficiency relative to the discriminant based on an initially classified training sample is evaluated for the case where a covariate is present. Effect of post–stratification is examined.  相似文献   

13.
In this paper, Anbar's (1983) approach for estimating a difference between two binomial proportions is discussed with respect to a hypothesis testing problem. Such an approach results in two possible testing strategies. While the results of the tests are expected to agree for a large sample size when two proportions are equal, the tests are shown to perform quite differently in terms of their probabilities of a Type I error for selected sample sizes. Moreover, the tests can lead to different conclusions, which is illustrated via a simple example; and the probability of such cases can be relatively large. In an attempt to improve the tests while preserving their relative simplicity feature, a modified test is proposed. The performance of this test and a conventional test based on normal approximation is assessed. It is shown that the modified Anbar's test better controls the probability of a Type I error for moderate sample sizes.  相似文献   

14.
We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ? n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.  相似文献   

15.
We investigate the sample size problem when a binomial parameter is to be estimated, but some degree of misclassification is possible. The problem is especially challenging when the degree to which misclassification occurs is not exactly known. Motivated by a Canadian survey of the prevalence of toxoplasmosis infection in pregnant women, we examine the situation where it is desired that a marginal posterior credible interval for the prevalence of width w has coverage 1−α, using a Bayesian sample size criterion. The degree to which the misclassification probabilities are known a priori can have a very large effect on sample size requirements, and in some cases achieving a coverage of 1−α is impossible, even with an infinite sample size. Therefore, investigators must carefully evaluate the degree to which misclassification can occur when estimating sample size requirements.  相似文献   

16.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

17.
The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability.  相似文献   

18.
A consistent test for difference in locations between two bivariate populations is proposed, The test is similar as the Mann-Whitney test and depends on the exceedances of slopes of the two samples where slope for each sample observation is computed by taking the ratios of the observed values. In terms of the slopes, it reduces to a univariate problem, The power of the test has been compared with those of various existing tests by simulation. The proposed test statistic is compared with Mardia's(1967) test statistics, Peters-Randies(1991) test statistic, Wilcoxon's rank sum test. statistic and Hotelling' T2 test statistic using Monte Carlo technique. It performs better than other statistics compared for small differences in locations between two populations when underlying population is population 7(light tailed population) and sample size 15 and 18 respectively. When underlying population is population 6(heavy tailed population) and sample sizes are 15 and 18 it performas better than other statistic compared except Wilcoxon's rank sum test statistics for small differences in location between two populations. It performs better than Mardia's(1967) test statistic for large differences in location between two population when underlying population is bivariate normal mixture with probability p=0.5, population 6, Pearson type II population and Pearson type VII population for sample size 15 and 18 .Under bivariate normal population it performs as good as Mardia' (1967) test statistic for small differences in locations between two populations and sample sizes 15 and 18. For sample sizes 25 and 28 respectively it performs better than Mardia's (1967) test statistic when underlying population is population 6, Pearson type II population and Pearson type VII population  相似文献   

19.
Simultaneous estimation of the location parameter μ and scale parameter σ of a normal distribution based on two selected sample quantiles out of sufficiently large sample of size n is considered. The optimal spacing which maximizes the asymptotic relative efficiency is proved to be symmetric.  相似文献   

20.
Kernel discriminant analysis translates the original classification problem into feature space and solves the problem with dimension and sample size interchanged. In high‐dimension low sample size (HDLSS) settings, this reduces the ‘dimension’ to that of the sample size. For HDLSS two‐class problems we modify Mika's kernel Fisher discriminant function which – in general – remains ill‐posed even in a kernel setting; see Mika et al. (1999). We propose a kernel naive Bayes discriminant function and its smoothed version, using first‐ and second‐degree polynomial kernels. For fixed sample size and increasing dimension, we present asymptotic expressions for the kernel discriminant functions, discriminant directions and for the error probability of our kernel discriminant functions. The theoretical calculations are complemented by simulations which show the convergence of the estimators to the population quantities as the dimension grows. We illustrate the performance of the new discriminant rules, which are easy to implement, on real HDLSS data. For such data, our results clearly demonstrate the superior performance of the new discriminant rules, and especially their smoothed versions, over Mika's kernel Fisher version, and typically also over the commonly used naive Bayes discriminant rule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号