首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Bayes classification rule offers the optimal classifier, minimizing the classification error rate, whereas the Neyman–Pearson lemma offers the optimal family of classifiers to maximize the detection rate for any given false alarm rate. These motivate studies on comparing classifiers based on similarities between the classifiers and the optimal. In this article, we define partial order relations on classifiers and families of classifiers, based on rankings of rate function values and rankings of test function values, respectively. Each partial order relation provides a sufficient condition, which yields better classification error rates or better performance on the receiver operating characteristic analysis. Various examples and applications of the partial order theorems are discussed to provide comparisons of classifiers and families of classifiers, including the comparison of cross-validation methods, training data that contains outliers, and labelling errors in training data. The Canadian Journal of Statistics 48: 152–166; 2020 © 2019 Statistical Society of Canada  相似文献   

2.
Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences.  相似文献   

3.
We extend the classical one-dimensional Bayes binary classifier to create a new classification rule that has a region of neutrality to account for cases where the implied weight of evidence is too weak for a confident classification. Our proposed rule allows a “No Prediction” when the observation is too ambiguous to have confidence in a definite prediction. The motivation for making “No Prediction” is that in our microbial community profiling application, a wrong prediction can be worse than making no prediction at all. On the other hand, too many “No Predictions” have adverse implications as well. Consequently, our proposed rule incorporates this trade-off using a cost structure that weighs the penalty for not making a definite prediction against the penalty for making an incorrect definite prediction. We demonstrate that our proposed rule outperforms a naive neutral-zone rule that has been routinely used in biological applications similar to ours.  相似文献   

4.
In this article, a sequential correction of two linear methods: linear discriminant analysis (LDA) and perceptron is proposed. This correction relies on sequential joining of additional features on which the classifier is trained. These new features are posterior probabilities determined by a basic classification method such as LDA and perceptron. In each step, we add the probabilities obtained on a slightly different data set, because the vector of added probabilities varies at each step. We therefore have many classifiers of the same type trained on slightly different data sets. Four different sequential correction methods are presented based on different combining schemas (e.g. mean rule and product rule). Experimental results on different data sets demonstrate that the improvements are efficient, and that this approach outperforms classical linear methods, providing a significant reduction in the mean classification error rate.  相似文献   

5.
A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github.  相似文献   

6.
This article investigates the possible use of our newly defined extended projection depth (abbreviated to EPD) in nonparametric discriminant analysis. We propose a robust nonparametric classifier, which relies on the intuitively simple notion of EPD. The EPD-based classifier assigns an observation to the population with respect to which it has the maximum EPD. Asymptotic properties of misclassification rates and robust properties of EPD-based classifier are discussed. A few simulated data sets are used to compare the performance of EPD-based classifier with Fisher's linear discriminant rule, quadratic discriminant rule, and PD-based classifier. It is also found that when the underlying distributions are elliptically symmetric, EPD-based classifier is asymptotically equivalent to the optimal Bayes classifier.  相似文献   

7.
Document classification is an area of great importance for which many classification methods have been developed. However, most of these methods cannot generate time-dependent classification rules. Thus, they are not the best choices for problems with time-varying structures. To address this problem, we propose a varying naïve Bayes model, which is a natural extension of the naïve Bayes model that allows for time-dependent classification rule. The method of kernel smoothing is developed for parameter estimation and a BIC-type criterion is invented for feature selection. Asymptotic theory is developed and numerical studies are conducted. Finally, the proposed method is demonstrated on a real dataset, which was generated by the Mayor Public Hotline of Changchun, the capital city of Jilin Province in Northeast China.  相似文献   

8.
The estimated test error of a learned classifier is the most commonly reported measure of classifier performance. However, constructing a high quality point estimator of the test error has proved to be very difficult. Furthermore, common interval estimators (e.g. confidence intervals) are based on the point estimator of the test error and thus inherit all the difficulties associated with the point estimation problem. As a result, these confidence intervals do not reliably deliver nominal coverage. In contrast we directly construct the confidence interval by use of smooth data-dependent upper and lower bounds on the test error. We prove that for linear classifiers, the proposed confidence interval automatically adapts to the non-smoothness of the test error, is consistent under fixed and local alternatives, and does not require that the Bayes classifier be linear. Moreover, the method provides nominal coverage on a suite of test problems using a range of classification algorithms and sample sizes.  相似文献   

9.
In this article, we consider the problem of classifying m independent repeated (multiple) observations coming from the same population under a separate sampling scheme. We derive the asymptotic risk of the proposed NN type classification rule and obtain the upper and lower bounds for it in specific cases in terms of Bayes risk. Using a Monte Carlo simulation study we show that, as m increases, the classification risk decreases.  相似文献   

10.
We investigate the problem of selecting the best population from positive exponential family distributions based on type-I censored data. A Bayes rule is derived and a monotone property of the Bayes selection rule is obtained. Following that property, we propose an early selection rule. Through this early selection rule, one can terminate the experiment on a few populations early and possibly make the final decision before the censoring time. An example is provided in the final part to illustrate the use of the early selection rule.  相似文献   

11.
A multinomial classification rule is proposed based on a prior-valued smoothing for the state probabilities. Asymptotically, the proposed rule has an error rate that converges uniformly and strongly to that of the Bayes rule. For a fixed sample size the prior-valued smoothing is effective in obtaining reason¬able classifications to the situations such as missing data. Empirically, the proposed rule is compared favorably with other commonly used multinomial classification rules via Monte Carlo sampling experiments  相似文献   

12.
The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.  相似文献   

13.
In the case of prior knowledge about the unknown parameter, the Bayesian predictive density coincides with the Bayes estimator for the true density in the sense of the Kullback-Leibler divergence, but this is no longer true if we consider another loss function. In this paper we present a generalized Bayes rule to obtain Bayes density estimators with respect to any α-divergence, including the Kullback-Leibler divergence and the Hellinger distance. For curved exponential models, we study the asymptotic behaviour of these predictive densities. We show that, whatever prior we use, the generalized Bayes rule improves (in a non-Bayesian sense) the estimative density corresponding to a bias modification of the maximum likelihood estimator. It gives rise to a correspondence between choosing a prior density for the generalized Bayes rule and fixing a bias for the maximum likelihood estimator in the classical setting. A criterion for comparing and selecting prior densities is also given.  相似文献   

14.
The recent advent of modern technology has generated a large number of datasets which can be frequently modeled as functional data. This paper focuses on the problem of multiclass classification for stochastic diffusion paths. In this context we establish a closed formula for the optimal Bayes rule. We provide new statistical procedures which are built either on the plug-in principle or on the empirical risk minimization principle. We show the consistency of these procedures under mild conditions. We apply our methodologies to the parametric case and illustrate their accuracy with a simulation study through examples.  相似文献   

15.
For the portfolio problem with unknown parameter values, we compare the conventional certainty equivalence portfolio choice with the optimal Bayes portfolio. In the important single risky asset case a diffuse Bayes rule leads to portfolios that differ significantly from those suggested by a certainty equivalence rule which we show are inadmissible relative to a quadratic utility function for the range of parameters we consider. These results are invariant to arbitrary changes in the utility function parameters. We illustrate the results using a simple mutual fund example.  相似文献   

16.
Most methods for variable selection work from the top down and steadily remove features until only a small number remain. They often rely on a predictive model, and there are usually significant disconnections in the sequence of methodologies that leads from the training samples to the choice of the predictor, then to variable selection, then to choice of a classifier, and finally to classification of a new data vector. In this paper we suggest a bottom‐up approach that brings the choices of variable selector and classifier closer together, by basing the variable selector directly on the classifier, removing the need to involve predictive methods in the classification decision, and enabling the direct and transparent comparison of different classifiers in a given problem. Specifically, we suggest ‘wrapper methods’, determined by classifier type, for choosing variables that minimize the classification error rate. This approach is particularly useful for exploring relationships among the variables that are chosen for the classifier. It reveals which variables have a high degree of leverage for correct classification using different classifiers; it shows which variables operate in relative isolation, and which are important mainly in conjunction with others; it permits quantification of the authority with which variables are selected; and it generally leads to a reduced number of variables for classification, in comparison with alternative approaches based on prediction.  相似文献   

17.
A Bayesian discovery procedure   总被引:1,自引:0,他引:1  
Summary.  We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximation leads us to a Bayesian discovery procedure, which exploits the multiple shrinkage in clusters that are implied by the assumed non-parametric model. We compare the Bayesian discovery procedure and the optimal discovery procedure estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumour samples. We extend the setting of the optimal discovery procedure by discussing modifications of the loss function that lead to different single-thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.  相似文献   

18.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

19.
In this note we consider the problem of, given a sample, selecting the number of bins in a histogram. A loss function is introduced which reflects the idea that smooth distributions should have fewer bins than rough distributions. A stepwise Bayes rule, based on the Bayesian bootstrap, is found and is shown to be admissible. Some simulation results are presented to show how the rule works in practice.  相似文献   

20.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号