首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
K. Fischer  Chr Thiele 《Statistics》2013,47(2):281-289
Linear discriminant rules for two symmetrical distributions, which only need the first and second moments of these distributions, are presented. The rules are based on Zhezhel's idea using the most unfavourable probabilities of misclassification as an optimality criterion. Also a rule is considered which deals with distributions differing in a location and scale parameter.  相似文献   

2.
Errors of misclassification and their probabilities are studied for classification problems associated with univariate inverse Gaussian distributions. The effects of applying the linear discriminant function (LDF), based on normality, to inverse Gaussian populations are assessed by comparing probabilities (optimum and conditional) based on the LDF with those based on the likelihood ratio rule (LR) for the inverse Gaussian, Both theoretical and empirical results are presented  相似文献   

3.
The problem of classification into two univariate normal populations with a common mean is considered. Several classification rules are proposed based on efficient estimators of the common mean. Detailed numerical comparisons of probabilities of misclassifications using these rules have been carried out. It is shown that the classification rule based on the Graybill-Deal estimator of the common mean performs the best. Classification rules are also proposed for the case when variances are assumed to be ordered. Comparison of these rules with the rule based on the Graybill-Deal estimator has been done with respect to individual probabilities of misclassification.  相似文献   

4.
This paper gives a comparative study of the K-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters; the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR).  相似文献   

5.
In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate.  相似文献   

6.
We consider the linear feature selection problem of obtaining a nonzero 1 × n matrix B which minimizes the probability of misclassification based on the Bayes decision rule applied to the random variable Y = BX, where X is a random n-vector arising from one of m Gaussian populations with equal covariances and equal apriori probabilities. It is shown that the optimal B satisfies a fixed point equation B = F(B) which can be solved by successive substitution.  相似文献   

7.
In this paper, we consider classification procedures for exponential populations when an order on the populations parameters is known. We define and study the behavior of a classification rule which takes into account the additional information and outperforms the likelihood-ratio-based rule when two populations are considered. Moreover, we study the behavior of this rule in each of the two populations and compare the misclassification probabilities with the classical ones. Type II censorship, which is usual in practice, is considered and results obtained. The performance for more than two populations is evaluated by simulation.  相似文献   

8.
The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability.  相似文献   

9.
Assume that a number of individuals are to be classified into one of two populations and that, at the same time, the proportion of members of each population needs to be estimated. The allocated proportions given by the Bayes classification rule are not consistent estimates of the true proportions, so a different classification rule is proposed; this rule yields consistent estimates with only a small increase in the probability of misclassification. As an illustration, the case of two normal distributions with equal covariance matrices is dealt with in detail.  相似文献   

10.
Consider classifying an n × I observation vector as coming from one of two multivariate normal distributions which differ both in mean vectors and covariance matrices. A class of dis-crimination rules based upon n independent univariate discrim-inate functions is developed yielding exact misclassification probabilities when the population parameters are known. An efficient search of this class to select the procedure with minimum expected misclassification is made by employing an algorithm of the implicit enumeration type used in integer programming. The procedure is applied to the classification of male twins as either monozygotic or dizygotic.  相似文献   

11.
ABSTRACT

Classification rules with a reserve judgment option provide a way to satisfy constraints on the misclassification probabilities when there is a high degree of overlap among the populations. Constructing rules which maximize the probability of correct classification while satisfying such constraints is a difficult optimization problem. This paper uses the form of the optimal solution to develop a relatively simple and computationally fast method for three populations which has a non parametric quality in controlling the misclassification probabilities. Simulations demonstrate that this procedure performs well.  相似文献   

12.
Consider a finite population of large but unknown size of hidden objects. Consider searching for these objects for a period of time, at a certain cost, and receiving a reward depending on the sizes of the objects found. Suppose that the size and discovery time of the objects both have unknown distributions, but the conditional distribution of time given size is exponential with an unknown non-negative and non-decreasing function of the size as failure rate. The goal is to find an optimal way to stop the discovery process. Assuming that the above parameters are known, an optimal stopping time is derived and its asymptotic properties are studied. Then, an adaptive rule based on order restricted estimates of the distributions from truncated data is presented. This adaptive rule is shown to perform nearly as well as the optimal stopping time for large population size.  相似文献   

13.
We consider a regularized D-classification rule for high dimensional binary classification, which adapts the linear shrinkage estimator of a covariance matrix as an alternative to the sample covariance matrix in the D-classification rule (D-rule in short). We find an asymptotic expression for misclassification rate of the regularized D-rule, when the sample size n and the dimension p both increase and their ratio pn approaches a positive constant γ. In addition, we compare its misclassification rate to the standard D-rule under various settings via simulation.  相似文献   

14.
Using 1998 and 1999 singleton birth data of the State of Florida, we study the stability of classification trees. Tree stability depends on both the learning algorithm and the specific data set. In this study, test samples are used in statistical learning to evaluate both stability and predictive performance. We also use the resampling technique bootstrap, which can be regarded as data self-perturbation, to evaluate the sensitivity of the modeling algorithm with respect to the specific data set. We demonstrate that the selection of the cost function plays an important role in stability. In particular, classifiers with equal misclassification costs and equal priors are less stable compared to those with unequal misclassification costs and equal priors.  相似文献   

15.
The classification of a random variable based on a mixture can be meaningfully discussed only if the class of all finite mixtures is identifiable. In this paper, we find the maximum-likelihood estimates of the parameters of the mixture of two inverse Weibull distributions by using classified and unclassified observations. Next, we estimate the nonlinear discriminant function of the underlying model. Also, we calculate the total probabilities of misclassification as well as the percentage bias. In addition, we investigate the performance of all results through a series of simulation experiments by means of relative efficiencies. Finally, we analyse some simulated and real data sets through the findings of the paper.  相似文献   

16.
Statistical methods for an asymmetric normal classification do not adapt well to the situations where the population distributions are perturbed by an interval-screening scheme. This paper explores methods for providing an optimal classification of future samples in this situation. The properties of the screened population distributions are considered and two optimal regions for classifying the future samples are obtained. These developments yield yet other rules for the interval-screened asymmetric normal classification. The rules are studied from several aspects such as the probability of misclassification, robustness, and estimation of the rules. The investigation of the performance of the rules as well as the illustration of the screened classification idea, using two numerical examples, is also considered.  相似文献   

17.
It has been recognized that counting the objects allocated by a rule of classification to several unknown classes often does not provide good estimates of the true class proportions of the objects to be classified. We propose a linear transformation of these classification estimates, which minimizes the mean squared error of the transformed estimates over all possible sets of true proportions. This so-called best-linear-corrector (BLC) transformation is a function of the confusion (classification-error) matrix and of the first and second moments of the prior distribution of the vector of proportions. When the number of objects to be classified increases, the BLC tends to the inverse of the confusion matrix. The estimates that are obtained directly by this inverse-confusion corrector (ICC) are also the maximum-likelihood unbiased estimates of the probabilities that the objects originate from one or the other class, had the objects been preselected with those probabilities. But for estimating the actual proportions, the ICC estimates behave less well than the raw classification estimates for some collections. In that situation, the BLC is substantially superior to the ICC even for some large collections of objects and is always substantially superior to the raw estimates. The statistical model is applied concretely to the measure of forest covers in remote sensing.  相似文献   

18.
This article considers multinomial data subject to misclassification in the presence of covariates which affect both the misclassification probabilities and the true classification probabilities. A subset of the data may be subject to a secondary measurement according to an infallible classifier. Computations are carried out in a Bayesian setting where it is seen that the prior has an important role in driving the inference. In addition, a new and less problematic definition of nonidentifiability is introduced and is referred to as hierarchical nonidentifiability.  相似文献   

19.
Several methods have been proposed to estimate the misclassification probabilities when a linear discriminant function is used to classify an observation into one of several populations. We describe the application of bootstrap sampling to the above problem. The proposed method has the advantage of not only furnishing the estimates of misclassification probabilities but also provides an estimate of the standard error of estimate. The method is illustrated by a small simulation experiment. It is then applied to three published, well accessible data sets, which are typical of large, medium and small data sets encountered in practice.  相似文献   

20.
An observation ×o is to be classified into one of two normal populations φ1 and φ2. A classification rule, the Two-stage sample Rule, R(TS), whose probability of misclassification, P[MC], is independent of the common but unknown variance is proposed. Some optimal properties of R(TS) are also discussed and some values of P[MC | R(TS)], the probability of misclassification given the rule R(TS), are tabulated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号