共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Learning classification trees 总被引:11,自引:0,他引:11
Wray Buntine 《Statistics and Computing》1992,2(2):63-73
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price. 相似文献
3.
We consider the fitting of a Bayesian model to grouped data in which observations are assumed normally distributed around group means that are themselves normally distributed, and consider several alternatives for accommodating the possibility of heteroscedasticity within the data. We consider the case where the underlying distribution of the variances is unknown, and investigate several candidate prior distributions for those variances. In each case, the parameters of the candidate priors (the hyperparameters) are themselves given uninformative priors (hyperpriors). The most mathematically convenient model for the group variances is to assign them inverse gamma distributed priors, the inverse gamma distribution being the conjugate prior distribution for the unknown variance of a normal population. We demonstrate that for a wide class of underlying distributions of the group variances, a model that assigns the variances an inverse gamma-distributed prior displays favorable goodness-of-fit properties relative to other candidate priors, and hence may be used as standard for modeling such data. This allows us to take advantage of the elegant mathematical property of prior conjugacy in a wide variety of contexts without compromising model fitness. We test our findings on nine real world publicly available datasets from different domains, and on a wide range of artificially generated datasets. 相似文献
4.
This article describes a statistical computing system, the Computer-Assisted Data Analysis (CADA) Monitor, for use in performing interactive statistical data analysis. Especially easy to use because of its conversational nature, CADA includes facilities for data management, evaluation of probability distributions, Bayesian parametric models, Bayesian simultaneous estimation, Bayesian full-rank analysis of variance, and exploratory data analysis. CADA is written in a transportable subset of BASIC, and versions are currently available for a variety of computers. 相似文献
5.
文章通过对隶属函数的集中度系数确定的探讨,将模糊数学应用于贝叶斯统计学,从而形成了一种新的假设检验方法. 相似文献
6.
Paola Monari 《Statistical Methods and Applications》1993,2(3):337-348
Summary The scientific attitude towards statistical method has always pursued two basic objectives: identifying false assumptions
and selecting, amongst the likely assertions, those which are most consistent with a given system. The methodological demarcation
between rejection of a statistical statement, because it is ?false?, or exclusion, because it is ?least probable?, lies in
the fundamental premises of inferential procedures. In the first class we find the methods proposed by Fisher, Neyman and
Pearson; in the second one, the Bayesian techniques. Even if different inferential theories may coexist, any particular solution
has a limit of validity strictly bouded, to the conventional procedural rules on which it is based.
Invited paper at the Conference on ?Statistical Tests: Methodology and Econometric Applications?, held in Bologna, Italy,
27–28 May 1993. 相似文献
7.
This article develops an algorithm for estimating parameters of general phase-type (PH) distribution based on Bayes estimation. The idea of Bayes estimation is to regard parameters as random variables, and the posterior distribution of parameters which is updated by the likelihood function provides estimators of parameters. One of the advantages of Bayes estimation is to evaluate uncertainty of estimators. In this article, we propose a fast algorithm for computing posterior distributions approximately, based on variational approximation. We formulate the optimal variational posterior distributions for PH distributions and develop the efficient computation algorithm for the optimal variational posterior distributions of discrete and continuous PH distributions. 相似文献
8.
This paper suggests a Bayesian approach to the reconstruction of a 2 × 2 contingency table where some of the observations are only partially categorized and others are fully categorized. In contrast, most previous Bayesian and non-Bayesian analyses of the partially categorized data problem have been concerned with estimation of the parameters that generated the data. We show in an example that estimates may not be extremely sensitive to the weight placed on prior information relative to the sample data. 相似文献
9.
Xiaomo Jiang 《Journal of applied statistics》2008,35(1):49-65
Multivariate model validation is a complex decision-making problem involving comparison of multiple correlated quantities, based upon the available information and prior knowledge. This paper presents a Bayesian risk-based decision method for validation assessment of multivariate predictive models under uncertainty. A generalized likelihood ratio is derived as a quantitative validation metric based on Bayes’ theorem and Gaussian distribution assumption of errors between validation data and model prediction. The multivariate model is then assessed based on the comparison of the likelihood ratio with a Bayesian decision threshold, a function of the decision costs and prior of each hypothesis. The probability density function of the likelihood ratio is constructed using the statistics of multiple response quantities and Monte Carlo simulation. The proposed methodology is implemented in the validation of a transient heat conduction model, using a multivariate data set from experiments. The Bayesian methodology provides a quantitative approach to facilitate rational decisions in multivariate model assessment under uncertainty. 相似文献
10.
Enrique De Alba Juan J. Fernández-Durán M. Mercedes Gregorio-Domínguez 《Journal of applied statistics》2006,33(1):89-99
Consider a random sample X1, X2,…, Xn, from a normal population with unknown mean and standard deviation. Only the sample size, mean and range are recorded and it is necessary to estimate the unknown population mean and standard deviation. In this paper the estimation of the mean and standard deviation is made from a Bayesian perspective by using a Markov Chain Monte Carlo (MCMC) algorithm to simulate samples from the intractable joint posterior distribution of the mean and standard deviation. The proposed methodology is applied to simulated and real data. The real data refers to the sugar content (oBRIX level) of orange juice produced in different countries. 相似文献
11.
Birsen Eygi Erdogan 《Journal of Statistical Computation and Simulation》2013,83(8):1543-1555
The purpose of this study was to apply support vector machines (SVMs) to bank bankruptcy analysis using practical steps. Although the prediction of the financial distress of companies is done using several statistical and machine learning techniques, bank classification and bankruptcy prediction still need to be investigated because few investigations have been conducted in this field of banking. In this study, SVMs were implemented to analyse financial ratios. Data sets from Turkish commercial banks were used. This study shows that SVMs with the Gaussian kernel are capable of extracting useful information from financial data and can be used as part of an early warning system. 相似文献
12.
《统计学通讯:理论与方法》2013,42(8):1743-1754
ABSTRACT The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables. 相似文献
13.
在20世纪下半叶,也许统计学界发生的最值得人们关注的事件,莫过于贝叶斯学派重新崛起。目前,贝叶斯学派已演变成为一股国际统计学界充满活力的新生力量,并正在对科学界产生广泛的影响。文章旨在通过对国外三个重要的有关贝叶斯统计组织的介绍,使读者从中窥视到现代贝叶斯统计学发展状况的一个缩影,并奢望文章能成为一块研究国际贝叶斯统计发展状况的引玉之石。 相似文献
14.
Andrea Gabrio 《Journal of applied statistics》2021,48(2):301
Statistical modelling of sports data has become more and more popular in the recent years and different types of models have been proposed to achieve a variety of objectives: from identifying the key characteristics which lead a team to win or lose to predicting the outcome of a game or the team rankings in national leagues. Although not as popular as football or basketball, volleyball is a team sport with both national and international level competitions in almost every country. However, there is almost no study investigating the prediction of volleyball game outcomes and team rankings in national leagues. We propose a Bayesian hierarchical model for the prediction of the rankings of volleyball national teams, which also allows to estimate the results of each match in the league. We consider two alternative model specifications of different complexity which are validated using data from the women''s volleyball Italian Serie A1 2017–2018 season. 相似文献
15.
Gary Koop 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(4):639-655
Summary. We develop Bayesian techniques for modelling the evolution of entire distributions over time and apply them to the distribution of team performance in Major League baseball for the period 1901–2000. Such models offer insight into many key issues (e.g. competitive balance) in a way that regression-based models cannot. The models involve discretizing the distribution and then modelling the evolution of the bins over time through transition probability matrices. We allow for these matrices to vary over time and across teams. We find that, with one exception, the transition probability matrices (and, hence, competitive balance) have been remarkably constant across time and over teams. The one exception is the Yankees, who have outperformed all other teams. 相似文献
16.
Donald A. Berry 《The American statistician》2013,67(3):241-246
University courses in elementary statistics are usually taught from a frequentist perspective. In this paper I suggest how such courses can be taught using a Bayesian approach, and I indicate why beginning students are well served by a Bayesian course. A principal focus of any good elementary course is the application of statistics to real and important scientific problems. The Bayesian approach fits neatly with a scientific focus. Bayesians take a larger view, and one not limited to data analysis. In particular, the Bayesian approach is subjective, and requires assessing prior probabilities. This requirement forces users to relate current experimental evidence to other available information–-including previous experiments of a related nature, where “related” is judged subjectively. I discuss difficulties faced by instructors and students in elementary Bayesian courses, and provide a sample syllabus for an elementary Bayesian course. 相似文献
17.
ABSTRACTStandard prior elicitation procedures require experts to explicitly quantify their beliefs about parameters in the form of multiple summaries. In this article, we draw on recent advances in the statistical graphics and information visualization communities to propose a novel elicitation scheme that implicitly learns an expert’s opinions through their sequential selection of graphics of carefully constructed hypothetical future samples. While the scheme can be applied to a broad array of models, we use it to construct procedures for elicitation in data models commonly used in practice: Bernoulli, Poisson, and Normal. We also provide open-source, web-based Shiny implementations of the procedures. 相似文献
18.
Günther Sawitzki 《Statistics》2013,47(3):393-401
An exact filter is an algorithm for calculating the a-posteriori distribution of the state ξ n of a process, given observations ηt, …,ηnup to time n. We describe a method to determine an appropriate algorithm for processes, where the distributions involved are members of exponential families, The resulting algorithm consists essen tially of a prediction term, combined with an affine transformation depending on the chosen model. 相似文献
19.
AbstractAlthough no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes. 相似文献
20.
Edward L. Boone Susan J. Simmons Haikun Bao Ann E. Stapleton 《Journal of applied statistics》2008,35(7):799-808
Quantitative trait loci (QTL) mapping is a growing field in statistical genetics. In plants, QTL detection experiments often feature replicates or clones within a specific genetic line. In this work, a Bayesian hierarchical regression model is applied to simulated QTL data and to a dataset from the Arabidopsis thaliana plants for locating the QTL mapping associated with cotyledon opening. A conditional model search strategy based on Bayesian model averaging is utilized to reduce the computational burden. 相似文献