首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although devised in 1936 by Fisher, discriminant analysis is still rapidly evolving, as the complexity of contemporary data sets grows exponentially. Our classification rules explore these complexities by modeling various correlations in higher-order data. Moreover, our classification rules are suitable to data sets where the number of response variables is comparable or larger than the number of observations. We assume that the higher-order observations have a separable variance-covariance matrix and two different Kronecker product structures on the mean vector. In this article, we develop quadratic classification rules among g different populations where each individual has κth order (κ ≥2) measurements. We also provide the computational algorithms to compute the maximum likelihood estimates for the model parameters and eventually the sample classification rules.  相似文献   

2.
The problem is to classify an individual into one of two populations based on an observation on the individual which follows a stationary Gaussian process and the populations are two distinct time points. Plug-in likelihood ratio rules are considered using samples from the process. The distribution of associated classification statistics are derived. For the special case when the mis-classification probabilities are equal, the nature of dependence between the population distributions on the probability of correct classification is studied. Lower bounds and iterative method of evaluation of the optimal correlation between the populations are obtained.  相似文献   

3.
The problem of classification into two univariate normal populations with a common mean is considered. Several classification rules are proposed based on efficient estimators of the common mean. Detailed numerical comparisons of probabilities of misclassifications using these rules have been carried out. It is shown that the classification rule based on the Graybill-Deal estimator of the common mean performs the best. Classification rules are also proposed for the case when variances are assumed to be ordered. Comparison of these rules with the rule based on the Graybill-Deal estimator has been done with respect to individual probabilities of misclassification.  相似文献   

4.
We develop classification rules for data that have an autoregressive circulant covariance structure under the assumption of multivariate normality. We also develop classification rules assuming a general circulant covariance structure. The new classification rules are efficient in reducing the misclassification error rates when the number of observations is not large enough to estimate the unknown variance–covariance matrix. The proposed classification rules are demonstrated by simulation study for their validity and illustrated by a real data analysis for their use. Analyses of both simulated data and real data show the effectiveness of our new classification rules.  相似文献   

5.
The two populations considered for this study are two distinct time points. Samples consist of observations made at both the time points on every sampling unit. The unit to be classified is observed at one of the two time points. The observation vectors contain covariates, having same expectation at both the time points. In this set-up admissibility of some likelihood ratio rules is established.  相似文献   

6.
An analysis of the 1-stage classification decision with two candidate populations is provided in this paper. When the successive posterior probabilities follow a first order markov process it it shown that the optimal classification rules are greatly simplified. A detailed analysis and example are provided for the important case of multivariate normality with equal covariance matrices.  相似文献   

7.
Consider classifying an n × I observation vector as coming from one of two multivariate normal distributions which differ both in mean vectors and covariance matrices. A class of dis-crimination rules based upon n independent univariate discrim-inate functions is developed yielding exact misclassification probabilities when the population parameters are known. An efficient search of this class to select the procedure with minimum expected misclassification is made by employing an algorithm of the implicit enumeration type used in integer programming. The procedure is applied to the classification of male twins as either monozygotic or dizygotic.  相似文献   

8.
In this paper, we suggest classification procedures of an observation into one of two exponential populations assuming a known ordering between population parameters. We propose classification rules when either location or scale parameters are ordered. Some of these classification rules under ordering are better than usual classification rules with respect to the expected probability of correct classification. We also derive likelihood ratio-based classification rules. Comparison of these classification rules has been done using Monte Carlo simulations.  相似文献   

9.
Statistical methods for an asymmetric normal classification do not adapt well to the situations where the population distributions are perturbed by an interval-screening scheme. This paper explores methods for providing an optimal classification of future samples in this situation. The properties of the screened population distributions are considered and two optimal regions for classifying the future samples are obtained. These developments yield yet other rules for the interval-screened asymmetric normal classification. The rules are studied from several aspects such as the probability of misclassification, robustness, and estimation of the rules. The investigation of the performance of the rules as well as the illustration of the screened classification idea, using two numerical examples, is also considered.  相似文献   

10.
We study the problem of classification with multiple q-variate observations with and without time effect on each individual. We develop new classification rules for populations with certain structured and unstructured mean vectors and under certain covariance structures. The new classification rules are effective when the number of observations is not large enough to estimate the variance–covariance matrix. Computational schemes for maximum likelihood estimates of required population parameters are given. We apply our findings to two real data sets as well as to a simulated data set.  相似文献   

11.
The use of the area under the receiver-operating characteristic, ROC, curve (AUC) as an index of diagnostic accuracy is overwhelming in fields such as biomedical science and machine learning. It seems that a larger AUC value has become synonymous with a better performance. The functional transformation of the marker values has been proposed in the specialized literature as a procedure for increasing the AUC and therefore the diagnostic accuracy. However, the classification process is based on some regions (classification subsets) which support the decision made; one subject is classified as positive if its marker is within this region and classified as negative otherwise. In this paper we study the capacity of improving the classification performance of univariate biomarkers via functional transformations and the impact of this transformation on the final classification regions based on a real-world dataset. Particularly, we consider the problem of determining the gender of a subject based on the Mode frequency of his/her voice. The shape of the cumulative distribution function of this characteristic in both the male and the female groups makes the resulting classification problem useful for illustrating the differences between having useful diagnostic rules and obtaining an optimal AUC value. Our point is that improving the AUC by means of a functional transformation can produce classification regions with no practical interpretability. We propose to improve the classification accuracy by making the selection of the classification subsets more flexible while preserving their interpretability. Besides, we provide different graphical approximations which allow us a better understanding of the classification problem.  相似文献   

12.

A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).

  相似文献   

13.
基于R软件rpart包的分类与回归树应用   总被引:5,自引:0,他引:5  
对于许多分类和回归问题,二叉树(Binary Tree)提供了有趣而又形象化的方式来研究数据,它主要是按照一定的规则拆分自变量,而完成对因变量的合理分类,进一步可以对未知分类进行预测。在主要介绍递归分割(Recursive Partitioning)和回归树(Regression Tree)在R软件中应用的同时,对一前列腺癌数据使用生存分析和分类与回归树相结合的方法做出分析,并得到了对于疾病诊断和预防较有指导意义的结论。  相似文献   

14.
The normal linear discriminant rule (NLDR) and the normal quadratic discriminant rule (NQDR) are popular classifiers when working with normal populations. Several papers in the literature have been devoted to a comparison of these rules with respect to classification performance. An aspect which has, however, not received any attention is the effect of an initial variable selection step on the relative performance of these classification rules. Cross model validation variabie selection has been found to perform well in the linear case, and can be extended to the quadratic case. We report the results of a simulation study comparing the NLDR and the NQDR with respect to the post variable selection classification performance. It is of interest that the NQDR generally benefits from an initial variable selection step. We also comment briefly on the problem of estimating the post selection error rates of the two rules.  相似文献   

15.
The two-population classification problem using dependent samples is extended when covariates are available for classification. Analysis is done using a conditional model, under a multivariate normal set-up, given the covariates. The conditional model considered here includes the parameter structure relevant to growth models. Likelihood ratio or plug-in likelihood ratio classification rules are derived depending on the knowledge of the parameters in the model. For exact distribution of the classification statistics, they are reduced to forms suitable for application of standard results.  相似文献   

16.
17.
In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate.  相似文献   

18.
Semiparametric Bayesian classification with longitudinal markers   总被引:1,自引:0,他引:1  
Summary.  We analyse data from a study involving 173 pregnant women. The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.  相似文献   

19.
ASSESSING ERROR RATE ESTIMATORS: THE LEAVE-ONE-OUT METHOD RECONSIDERED   总被引:1,自引:0,他引:1  
Many comparative studies of the estimators of error rates of supervised classification rules are based on inappropriate criteria. In particular, although they fix the Bayes error rate, their summary statistics aggregate a range of true error rates. This means that their conclusions about the performance of classification rules cannot be trusted. This paper discusses the general issues involved, and then focuses attention specifically on the leave-one-out estimator. The estimator is investigated in a simulation study, both in absolute terms and in comparison with a popular bootstrap estimator. An improvement to the leave-one-out estimator is suggested, but the bootstrap estimator appears to maintain superiority even when the criteria are adjusted.  相似文献   

20.
Methods are proposed to combine several individual classifiers in order to develop more accurate classification rules. The proposed algorithm uses Rademacher–Walsh polynomials to combine M (≥2) individual classifiers in a nonlinear way. The resulting classifier is optimal in the sense that its misclassification error rate is always less than, or equal to, that of each constituent classifier. A number of numerical examples (based on both real and simulated data) are also given. These examples demonstrate some new, and far-reaching, benefits of working with combined classifiers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号