首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
This article proposes a discriminant function and an algorithm to analyze the data addressing the situation, where the data are positively skewed. The performance of the suggested algorithm based on the suggested discriminant function (LNDF) has been compared with the conventional linear discriminant function (LDF) and quadratic discriminant function (QDF) as well as with the nonparametric support vector machine (SVM) and the Random Forests (RFs) classifiers, using real and simulated datasets. A maximum reduction of approximately 81% in the error rates as compared to QDF for ten-variate data was noted. The overall results are indicative of better performance of the proposed discriminant function under certain circumstances.  相似文献   

2.
This article considers the problem of statistical classification involving multivariate normal populations and compares the performance of the linear discriminant function (LDF) and the Euclidean distance function (EDF), Although the LDF is quite popular and robust, it has been established (Marco, Young and Turner, 1989) that under certain non-trivial conditions, the EDF is "equivalent" to the LDF, in terms of equal probabilities of misclassifica-tion (error rates). Thus it follows that under those conditions the sample EDF could perform better than the sample LDF, since the sample EDF involves estimation of fewer parameters. Sindation results, also from the above paper; seemed to support this hypothesis. This article compares the two sample discriminant functions through asymptotic expansions of error rates, and identifies situations when the sample EDF should perform better than the sample LDF. Results from simulation experiments are also reported and discussed.  相似文献   

3.
The procedure of statistical discrimination Is simple in theory but so simple in practice. An observation x0possibly uiultivariate, is to be classified into one of several populations π1,…,πk which have respectively, the density functions f1(x), ? ? ? , fk(x). The decision procedure is to evaluate each density function at X0 to see which function gives the largest value fi(X0) , and then to declare that X0 belongs to the population corresponding to the largest value. If these den-sities can be assumed to be normal with equal covariance matricesthen the decision procedure is known as Fisher’s linear discrimi-nant function (LDF) method. In the case of unequal covariance matrices the procedure is called the quadratic discriminant func-tion (QDF) method. If the densities cannot be assumed to be nor-mal then the LDF and QDF might not perform well. Several different procedures have appeared in the literature which offer discriminant procedures for nonnormal data. However, these pro-cedures are generally difficult to use and are not readily available as canned statistical programs.

Another approach to discriminant analysis is to use some sortof mathematical trans format ion on the samples so that their distribution function is approximately normal, and then use the convenient LDF and QDF methods. One transformation that:applies to all distributions equally well is the rank transformation. The result of this transformation is that a very simple and easy to use procedure is made available. This procedure is quite robust as is evidenced by comparisons of the rank transform results with several published simulation studies.  相似文献   

4.
The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability.  相似文献   

5.
Errors of misclassification and their probabilities are studied for classification problems associated with univariate inverse Gaussian distributions. The effects of applying the linear discriminant function (LDF), based on normality, to inverse Gaussian populations are assessed by comparing probabilities (optimum and conditional) based on the LDF with those based on the likelihood ratio rule (LR) for the inverse Gaussian, Both theoretical and empirical results are presented  相似文献   

6.
The sample linear discriminant function (LDF) is known to perform poorly when the number of features p is large relative to the size of the training samples, A simple and rarely applied alternative to the sample LDF is the sample Euclidean distance classifier (EDC). Raudys and Pikelis (1980) have compared the sample LDF with three other discriminant functions, including thesample EDC, when classifying individuals from two spherical normal populations. They have concluded that the sample EDC outperforms the sample LDF when p is large relative to the training sample size. This paper derives conditions for which the two classifiers are equivalent when all parameters are known and employs a Monte Carlo simulation to compare the sample EDC with the sample LDF no only for the spherical normal case but also for several nonspherical parameter configurations. Fo many practical situations, the sample EDC performs as well as or superior to the sample LDF, even for nonspherical covariance configurations.  相似文献   

7.
The linear discriminant function (LDF) is known to be optimal in the sense of achieving an optimal error rate when sampling from multivariate normal populations with equal covariance matrices. Use of the LDF in nonnormal situations is known to lead to some strange results. This paper will focus on an evaluation of misclassification probabilities when the power transformation could have been used to achieve at least approximate normality and equal covariance matrices in the sampled populations for the distribution of the observed random variables. Attention is restricted to the two-population case with bivariate distributions.  相似文献   

8.
A procedure is presented for finding maximum likelihood estimates of the parameters of a mixture of two random walk distributions in two cases, using classified and unclassified observations. Based on small sample size, estimation of nonlinear discriminant functions is considered. Throughout simulation experiments, the performance of the corresponding estimated nonlinear discriminant functions is investigated. The total probabilities of misclassification and percentage biases are evaluated and discussed.  相似文献   

9.
The quadratic discriminant function (QDF) with known parameters has been represented in terms of a weighted sum of independent noncentral chi-square variables. To approximate the density function of the QDF as m-dimensional exponential family, its moments in each order have been calculated. This is done using the recursive formula for the moments via the Stein's identity in the exponential family. We validate the performance of our method using simulation study and compare with other methods in the literature based on the real data. The finding results reveal better estimation of misclassification probabilities, and less computation time with our method.  相似文献   

10.
We consider the problem of the effect of sample designs on discriminant analysis. The selection of the learning sample is assumed to depend on the population values of auxiliary variables. Under a superpopulation model with a multivariate normal distribution, unbiasedness and consistency are examined for the conventional estimators (derived under the assumptions of simple random sampling), maximum likelihood estimators, probability-weighted estimators and conditionally unbiased estimators of parameters. Four corresponding sampled linear discriminant functions are examined. The rates of misclassification of these four discriminant functions and the effect of sample design on these four rates of misclassification are discussed. The performances of these four discriminant functions are assessed in a simulation study.  相似文献   

11.
Generalized discriminant analysis based on distances   总被引:14,自引:1,他引:13  
This paper describes a method of generalized discriminant analysis based on a dissimilarity matrix to test for differences in a priori groups of multivariate observations. Use of classical multidimensional scaling produces a low‐dimensional representation of the data for which Euclidean distances approximate the original dissimilarities. The resulting scores are then analysed using discriminant analysis, giving tests based on the canonical correlations. The asymptotic distributions of these statistics under permutations of the observations are shown to be invariant to changes in the distributions of the original variables, unlike the distributions of the multi‐response permutation test statistics which have been considered by other workers for testing differences among groups. This canonical method is applied to multivariate fish assemblage data, with Monte Carlo simulations to make power comparisons and to compare theoretical results and empirical distributions. The paper proposes classification based on distances. Error rates are estimated using cross‐validation.  相似文献   

12.
Kernel discriminant analysis translates the original classification problem into feature space and solves the problem with dimension and sample size interchanged. In high‐dimension low sample size (HDLSS) settings, this reduces the ‘dimension’ to that of the sample size. For HDLSS two‐class problems we modify Mika's kernel Fisher discriminant function which – in general – remains ill‐posed even in a kernel setting; see Mika et al. (1999). We propose a kernel naive Bayes discriminant function and its smoothed version, using first‐ and second‐degree polynomial kernels. For fixed sample size and increasing dimension, we present asymptotic expressions for the kernel discriminant functions, discriminant directions and for the error probability of our kernel discriminant functions. The theoretical calculations are complemented by simulations which show the convergence of the estimators to the population quantities as the dimension grows. We illustrate the performance of the new discriminant rules, which are easy to implement, on real HDLSS data. For such data, our results clearly demonstrate the superior performance of the new discriminant rules, and especially their smoothed versions, over Mika's kernel Fisher version, and typically also over the commonly used naive Bayes discriminant rule.  相似文献   

13.
Abstract

A number of tests have been proposed for assessing the location-scale assumption that is often invoked by practitioners. Existing approaches include Kolmogorov–Smirnov and Cramer–von Mises statistics that each involve measures of divergence between unknown joint distribution functions and products of marginal distributions. In practice, the unknown distribution functions embedded in these statistics are typically approximated using nonsmooth empirical distribution functions (EDFs). In a recent article, Li, Li, and Racine establish the benefits of smoothing the EDF for inference, though their theoretical results are limited to the case where the covariates are observed and the distributions unobserved, while in the current setting some covariates and their distributions are unobserved (i.e., the test relies on population error terms from a location-scale model) which necessarily involves a separate theoretical approach. We demonstrate how replacing the nonsmooth distributions of unobservables with their kernel-smoothed sample counterparts can lead to substantial power improvements, and extend existing approaches to the smooth multivariate and mixed continuous and discrete data setting in the presence of unobservables. Theoretical underpinnings are provided, Monte Carlo simulations are undertaken to assess finite-sample performance, and illustrative applications are provided.  相似文献   

14.
Li and Liu [New nonparametric tests of multivariate locations and scales. Statist Sci. 2004;19(4):686–696] introduced two tests for a difference in locations of two multivariate distributions based on the concept of data depth. Using the simplicial depth [Liu RY. On a notion of data depth based on random simplices. Ann Stat. 1990;18(1):405–414], they studied the performance of these tests for symmetric distributions, namely, the normal and the Cauchy, in a simulation study. However, to the best of our knowledge, the performance of these tests for skewed distributions has not been studied in the current literature. This paper is a contribution in that direction and examines the performance of these depth-based tests in an extensive simulation study involving ten distributions belonging to five well-known families of multivariate skewed distributions. The study includes a comparison of the performance of these tests for four popular affine-invariant depth functions. Conclusions and recommendations are offered.  相似文献   

15.
In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.  相似文献   

16.
In the recent years, the notion of data depth has been used in nonparametric multivariate data analysis since it gives natural ‘centre-outward’ ordering of multivariate data points with respect to the given data cloud. In the literature, various nonparametric tests are developed for testing equality of location of two multivariate distributions based on data depth. Here, we define two nonparametric tests based on two different test statistic for testing equality of locations of two multivariate distributions. In the present work, we compare the performance of these tests with the tests developed by Li and Liu [New nonparametric tests of multivariate locations and scales using data depth. Statist Sci. 2004;(1):686–696] for testing equality of locations of two multivariate distributions. Comparison in terms of power is done for multivariate symmetric and skewed distributions using simulation for three popular depth functions. Application of tests to real life data is provided. Conclusion and recommendations are also provided.  相似文献   

17.
A method for inducing a desired rank correlation matrix on multivariate input vectors for simulation studies has recently been developed by Iman and Conover (1982). The primary intention of this procedure is to produce correlated input variables for use with computer models. Since this procedure is distribution free and allows the exact marginal distributions to remain intact it can be used with any marginal distributions for which it is reasonable to think in terms of correlation. In this paper we present a series of rank correlation plots based on this procedure when the marginal distributions are normal, lognormal, uniform and loguniform. These plots provide a convenient tool both for aiding the modeler in determining the degree of dependence among input variables (rather than guessing) and for communicating with the modeler the effect of different correlation assumptions. In addition this procedure can be used with sample multivariate data by sampling directly from the respective marginal empirical distribution functions.  相似文献   

18.
The property of identifiability is an important consideration on estimating the parameters in a mixture of distributions. Also classification of a random variable based on a mixture can be meaning fully discussed only if the class of all finite mixtures is identifiable. The problem of identifiability of finite mixture of Gompertz distributions is studied. A procedure is presented for finding maximum likelihood estimates of the parameters of a mixture of two Gompertz distributions, using classified and unclassified observations. Based on small sample size, estimation of a nonlinear discriminant function is considered. Throughout simulation experiments, the performance of the corresponding estimated nonlinear discriminant function is investigated.  相似文献   

19.
Eigenvalues and functions of eigenvalues play an important role in the reduction of the dimensionality of data in multivariate analysis. However, even under the usual normal model context, the associated distributional theory is extremely complicated. In this paper, bootstrap algorithms for ap-proximating the distributions of functions of certain eigenvalues are given, with applications to confidence interval construction for population param-eters. Extensive Monte Carlo simulation results demonstrate the small sample performance of the bootstrap simultaneous confidence sets  相似文献   

20.
Using the techniques developed by Subrahmaniam and Ching’anda (1978), we study the robustness to nonnormality of the linear discriminant functions. It is seen that the LDF procedure is quite robust against the likelihood ratio rule. The latter yields in all cases much smaller overall error rates; however, the disparity between the error rates of the LDF and LR procedures is not large enough to warrant the recommendation to use the more complicated LR procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号