首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Kontkanen  P.  Myllymäki  P.  Silander  T.  Tirri  H.  Grünwald  P. 《Statistics and Computing》2000,10(1):39-54
In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffrey's prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.  相似文献   

2.
Bayesian model averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ?1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ?1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional data sets. We apply our technique in simulations, as well as to some applications that arise in genomics.  相似文献   

3.
Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.  相似文献   

4.
The support vector machine (SVM) has been successfully applied to various classification areas with great flexibility and a high level of classification accuracy. However, the SVM is not suitable for the classification of large or imbalanced datasets because of significant computational problems and a classification bias toward the dominant class. The SVM combined with the k-means clustering (KM-SVM) is a fast algorithm developed to accelerate both the training and the prediction of SVM classifiers by using the cluster centers obtained from the k-means clustering. In the KM-SVM algorithm, however, the penalty of misclassification is treated equally for each cluster center even though the contributions of different cluster centers to the classification can be different. In order to improve classification accuracy, we propose the WKM–SVM algorithm which imposes different penalties for the misclassification of cluster centers by using the number of data points within each cluster as a weight. As an extension of the WKM–SVM, the recovery process based on WKM–SVM is suggested to incorporate the information near the optimal boundary. Furthermore, the proposed WKM–SVM can be successfully applied to imbalanced datasets with an appropriate weighting strategy. Experiments show the effectiveness of our proposed methods.  相似文献   

5.
Most methods for variable selection work from the top down and steadily remove features until only a small number remain. They often rely on a predictive model, and there are usually significant disconnections in the sequence of methodologies that leads from the training samples to the choice of the predictor, then to variable selection, then to choice of a classifier, and finally to classification of a new data vector. In this paper we suggest a bottom‐up approach that brings the choices of variable selector and classifier closer together, by basing the variable selector directly on the classifier, removing the need to involve predictive methods in the classification decision, and enabling the direct and transparent comparison of different classifiers in a given problem. Specifically, we suggest ‘wrapper methods’, determined by classifier type, for choosing variables that minimize the classification error rate. This approach is particularly useful for exploring relationships among the variables that are chosen for the classifier. It reveals which variables have a high degree of leverage for correct classification using different classifiers; it shows which variables operate in relative isolation, and which are important mainly in conjunction with others; it permits quantification of the authority with which variables are selected; and it generally leads to a reduced number of variables for classification, in comparison with alternative approaches based on prediction.  相似文献   

6.
We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the data set. This criterion can be applied to general prediction problems (e.g. regression or classification) and to general prediction rules (e.g. stepwise regression, tree-based models and neural nets). As a by-product we obtain a measure of the effective number of parameters used by an adaptive procedure. We relate the covariance inflation criterion to other model selection procedures and illustrate its use in some regression and classification problems. We also revisit the conditional bootstrap approach to model selection.  相似文献   

7.
Periodic autoregressive (PAR) models with symmetric innovations are widely used on time series analysis, whereas its asymmetric counterpart inference remains a challenge, because of a number of problems related to the existing computational methods. In this paper, we use an interesting relationship between periodic autoregressive and vector autoregressive (VAR) models to study maximum likelihood and Bayesian approaches to the inference of a PAR model with normal and skew-normal innovations, where different kinds of estimation methods for the unknown parameters are examined. Several technical difficulties which are usually complicated to handle are reported. Results are compared with the existing classical solutions and the practical implementations of the proposed algorithms are illustrated via comprehensive simulation studies. The methods developed in the study are applied and illustrate a real-time series. The Bayes factor is also used to compare the multivariate normal model versus the multivariate skew-normal model.  相似文献   

8.
For classification problems where the test data are labeled sequentially, the point at which all true positives are first identified is often of critical importance. This article develops hypothesis tests to assess whether all true positives have been labeled in the test data. The tests use a partial receiver operating characteristic (ROC) that is generated from a labeled subset of the test data. These methods are developed in the context of unexploded ordnance (UXO) classification, but are applicable to any binary classification problem. First, the likelihood of the observed ROC given binormal model parameters is derived using order statistics, leading to a nonlinear parameter estimation problem. I then derive the approximate distribution of the point on the ROC at which all true instances are found. Using estimated binormal parameters, this distribution can be integrated up to a desired confidence level to define a critical false alarm rate (FAR). If the selected operating point is before this critical point, then additional labels out to the critical point are required. A second test uses the uncertainty in binormal parameters to determine the critical FAR. These tests are demonstrated with UXO classification examples and both approaches are recommended for testing operating points.  相似文献   

9.
Relative risks are often considered preferable to odds ratios for quantifying the association between a predictor and a binary outcome. Relative risk regression is an alternative to logistic regression where the parameters are relative risks rather than odds ratios. It uses a log link binomial generalised linear model, or log‐binomial model, which requires parameter constraints to prevent probabilities from exceeding 1. This leads to numerical problems with standard approaches for finding the maximum likelihood estimate (MLE), such as Fisher scoring, and has motivated various non‐MLE approaches. In this paper we discuss the roles of the MLE and its main competitors for relative risk regression. It is argued that reliable alternatives to Fisher scoring mean that numerical issues are no longer a motivation for non‐MLE methods. Nonetheless, non‐MLE methods may be worthwhile for other reasons and we evaluate this possibility for alternatives within a class of quasi‐likelihood methods. The MLE obtained using a reliable computational method is recommended, but this approach requires bootstrapping when estimates are on the parameter space boundary. If convenience is paramount, then quasi‐likelihood estimation can be a good alternative, although parameter constraints may be violated. Sensitivity to model misspecification and outliers is also discussed along with recommendations and priorities for future research.  相似文献   

10.
Document classification is an area of great importance for which many classification methods have been developed. However, most of these methods cannot generate time-dependent classification rules. Thus, they are not the best choices for problems with time-varying structures. To address this problem, we propose a varying naïve Bayes model, which is a natural extension of the naïve Bayes model that allows for time-dependent classification rule. The method of kernel smoothing is developed for parameter estimation and a BIC-type criterion is invented for feature selection. Asymptotic theory is developed and numerical studies are conducted. Finally, the proposed method is demonstrated on a real dataset, which was generated by the Mayor Public Hotline of Changchun, the capital city of Jilin Province in Northeast China.  相似文献   

11.
Classical statistical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). These methods often make certain assumptions on the form of probability functions or on the underlying distributions of subclasses. In this article, we develop a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. In particular, the new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate competitive performance of the new approach and compare it with several other existing methods.  相似文献   

12.
In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality.  相似文献   

13.
Summary.  Meta-analysis in the presence of unexplained heterogeneity is frequently undertaken by using a random-effects model, in which the effects underlying different studies are assumed to be drawn from a normal distribution. Here we discuss the justification and interpretation of such models, by addressing in turn the aims of estimation, prediction and hypothesis testing. A particular issue that we consider is the distinction between inference on the mean of the random-effects distribution and inference on the whole distribution. We suggest that random-effects meta-analyses as currently conducted often fail to provide the key results, and we investigate the extent to which distribution-free, classical and Bayesian approaches can provide satisfactory methods. We conclude that the Bayesian approach has the advantage of naturally allowing for full uncertainty, especially for prediction. However, it is not without problems, including computational intensity and sensitivity to a priori judgements. We propose a simple prediction interval for classical meta-analysis and offer extensions to standard practice of Bayesian meta-analysis, making use of an example of studies of 'set shifting' ability in people with eating disorders.  相似文献   

14.
This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (J. R. Stat. Soc. Ser. B 64:295–306, 2002; Biometrika 90:99–112, 2003) and Cucala et al. (J. Am. Stat. Assoc. 104:263–273, 2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm.  相似文献   

15.

A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).

  相似文献   

16.
Two different approaches to obtaining finite-sample corrections to score tests are the analytical and the computational approaches. The former is based either on a Bartletttype correction to the test statistic or on the inversion of an Edgeworth expansion to its null distribution. The latter, on the other hand, is usually based on a bootstrapping resampling scheme. This paper provides a numerical comparison of the size and power properties of these two approaches both under correct model specification and under model misspecification.  相似文献   

17.
Nearest neighborhood classification is a flexible classification method that works under weak assumptions. The basic concept is to use the weighted or un-weighted sums over class indicators of observations in the neighborhood of the target value. Two modifications that improve the performance are considered here. Firstly, instead of using weights that are solely determined by the distances we estimate the weights by use of a logit model. By using a selection procedure like lasso or boosting the relevant nearest neighbors are automatically selected. Based on the concept of estimation and selection, in the second step, we extend the predictor space. We include nearest neighborhood counts, but also the original predictors themselves and nearest neighborhood counts that use distances in sub dimensions of the predictor space. The resulting classifiers combine the strength of nearest neighbor methods with parametric approaches and by use of sub dimensions are able to select the relevant features. Simulations and real data sets demonstrate that the method yields better misclassification rates than currently available nearest neighborhood methods and is a strong and flexible competitor in classification problems.  相似文献   

18.
Effective component relabeling in Bayesian analyses of mixture models is critical to the routine use of mixtures in classification with analysis based on Markov chain Monte Carlo methods. The classification-based relabeling approach here is computationally attractive and statistically effective, and scales well with sample size and number of mixture components concordant with enabling routine analyses of increasingly large data sets. Building on the best of existing methods, practical relabeling aims to match data:component classification indicators in MCMC iterates with those of a defined reference mixture distribution. The method performs as well as or better than existing methods in small dimensional problems, while being practically superior in problems with larger data sets as the approach is scalable. We describe examples and computational benchmarks, and provide supporting code with efficient computational implementation of the algorithm that will be of use to others in practical applications of mixture models.  相似文献   

19.
A mixed-integer programing formulation for clustering is proposed, one that encompasses a wider range of objectives--and side conditions--than standard clustering approaches. The flexibility of the formulation is demonstrated in diagrams of sample problems and solutions. Preliminary computational tests in a practical setting confirm the usefulness of the formulation.  相似文献   

20.
Multiple-membership logit models with random effects are models for clustered binary data, where each statistical unit can belong to more than one group. The likelihood function of these models is analytically intractable. We propose two different approaches for parameter estimation: indirect inference and data cloning (DC). The former is a non-likelihood-based method which uses an auxiliary model to select reasonable estimates. We propose an auxiliary model with the same dimension of parameter space as the target model, which is particularly convenient to reach good estimates very fast. The latter method computes maximum likelihood estimates through the posterior distribution of an adequate Bayesian model, fitted to cloned data. We implement a DC algorithm specifically for multiple-membership models. A Monte Carlo experiment compares the two methods on simulated data. For further comparison, we also report Bayesian posterior mean and Integrated Nested Laplace Approximation hybrid DC estimates. Simulations show a negligible loss of efficiency for the indirect inference estimator, compensated by a relevant computational gain. The approaches are then illustrated with two real examples on matched paired data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号