首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.  相似文献   

2.
Many credit risk models are based on the selection of a single logistic regression model, on which to base parameter estimation. When many competing models are available, and without enough guidance from economical theory, model averaging represents an appealing alternative to the selection of single models. Despite model averaging approaches have been present in statistics for many years, only recently they are starting to receive attention in economics and finance applications. This contribution shows how Bayesian model averaging can be applied to credit risk estimation, a research area that has received a great deal of attention recently, especially in the light of the global financial crisis of the last few years and the correlated attempts to regulate international finance. The paper considers the use of logistic regression models under the Bayesian Model Averaging paradigm. We argue that Bayesian model averaging is not only more correct from a theoretical viewpoint, but also slightly superior, in terms of predictive performance, with respect to single selected models.  相似文献   

3.
When multiple data owners possess records on different subjects with the same set of attributes—known as horizontally partitioned data—the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.  相似文献   

4.
We extend the Bayesian Model Averaging (BMA) framework to dynamic panel data models with endogenous regressors using a Limited Information Bayesian Model Averaging (LIBMA) methodology. Monte Carlo simulations confirm the asymptotic performance of our methodology both in BMA and selection, with high posterior inclusion probabilities for all relevant regressors, and parameter estimates very close to their true values. In addition, we illustrate the use of LIBMA by estimating a dynamic gravity model for bilateral trade. Once model uncertainty, dynamics, and endogeneity are accounted for, we find several factors that are robustly correlated with bilateral trade. We also find that applying methodologies that do not account for either dynamics or endogeneity (or both) results in different sets of robust determinants.  相似文献   

5.
Family studies are often conducted to examine the existence of familial aggregation. Particularly, twin studies can model separately the genetic and environmental contribution. Here we estimate the heritability of quantitative traits via variance components of random-effects in linear mixed models (LMMs). The motivating example was a myopia twin study containing complex nesting data structures: twins and siblings in the same family and observations on both eyes for each individual. Three models are considered for this nesting structure. Our proposal takes into account the model uncertainty in both covariates and model structures via an extended Bayesian model averaging (EBMA) procedure. We estimate the heritability using EBMA under three suggested model structures. When compared with the results under the model with the highest posterior model probability, the EBMA estimate has smaller variation and is slightly conservative. Simulation studies are conducted to evaluate the performance of variance-components estimates, as well as the selections of risk factors, under the correct or incorrect structure. The results indicate that EBMA, with consideration of uncertainties in both covariates and model structures, is robust in model misspecification than the usual Bayesian model averaging (BMA) that considers only uncertainty in covariates selection.  相似文献   

6.
Existing models for coronary heart disease study use a set of common risk factors to predict the survival time of the disease, via the standard Cox regression model. For complex relationships between the survival time and risk factors, the linear regression specification in the existing Cox model is not flexible enough to accounts for such relationships. Also, the risk factors are actually risky only when they fall in some risk ranges. For more flexibility in modelling and characterize the risk factors more accurately, we study a semi-parametric additive Cox model, using basis splines and LASSO technique. The proposed model is evaluated by simulation studies and is used for the analysis of a real data in the Strong Heart Study.  相似文献   

7.
Various statistical models have been proposed for two‐dimensional dose finding in drug‐combination trials. However, it is often a dilemma to decide which model to use when conducting a particular drug‐combination trial. We make a comprehensive comparison of four dose‐finding methods, and for fairness, we apply the same dose‐finding algorithm under the four model structures. Through extensive simulation studies, we compare the operating characteristics of these methods in various practical scenarios. The results show that different models may lead to different design properties and that no single model performs uniformly better in all scenarios. As a result, we propose using Bayesian model averaging to overcome the arbitrariness of the model specification and enhance the robustness of the design. We assign a discrete probability mass to each model as the prior model probability and then estimate the toxicity probabilities of combined doses in the Bayesian model averaging framework. During the trial, we adaptively allocated each new cohort of patients to the most appropriate dose combination by comparing the posterior estimates of the toxicity probabilities with the prespecified toxicity target. The simulation results demonstrate that the Bayesian model averaging approach is robust under various scenarios. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
Summary.  Using Bayesian model averaging, we quantify associations of governance and economic health with country level presence of foot-and-mouth disease (FMD) and estimate the probability of the presence of FMD in each country from 1997 to 2005. The Bayesian model averaging accounted for countries' previous FMD status and other possible confounders, as well as uncertainty about the 'true' model, and provided accurate predictions (90% specificity and 80% sensitivity). This model represents a novel approach to predicting FMD, and other conditions, on a global scale and in identifying important risk factors that can be applied to global policy and allocation of resources for disease control.  相似文献   

9.
Modeling the joint tail of an unknown multivariate distribution can be characterized as modeling the tail of each marginal distribution and modeling the dependence structure between the margins. Classical methods for modeling multivariate extremes are based on the class of multivariate extreme value distributions. However, such distributions do not allow for the possibility of dependence at finite levels that vanishes in the limit. Alternative models have been developed that account for this asymptotic independence, but inferential statistical procedures seeking to combine the classes of asymptotically dependent and asymptotically independent models have been of limited use. We overcome these difficulties by employing Bayesian model averaging to account for both types of asymptotic behavior, and for subclasses within the asymptotically independent framework. Our approach also allows for the calculation of posterior probabilities of different classes of models, allowing for direct comparison between them. We demonstrate the use of joint tail models based on our broader methodology using two oceanographic datasets and a brief simulation study.  相似文献   

10.
The value at risk (VaR) is a risk measure that is widely used by financial institutions to allocate risk. VaR forecast estimation involves the evaluation of conditional quantiles based on the currently available information. Recent advances in VaR evaluation incorporate conditional variance into the quantile estimation, which yields the conditional autoregressive VaR (CAViaR) models. However, uncertainty with regard to model selection in CAViaR model estimators raises the issue of identifying the better quantile predictor via averaging. In this study, we propose a quasi-Bayesian model averaging method that generates combinations of conditional VaR estimators based on single CAViaR models. This approach provides us a basis for comparing single CAViaR models against averaged ones for their ability to forecast VaR. We illustrate this method using simulated and financial daily return data series. The results demonstrate significant findings with regard to the use of averaged conditional VaR estimates when forecasting quantile risk.  相似文献   

11.
We introduce a Bayesian approach to test linear autoregressive moving-average (ARMA) models against threshold autoregressive moving-average (TARMA) models. First, the marginal posterior densities of all parameters, including the threshold and delay, of a TARMA model are obtained by using Gibbs sampler with Metropolis–Hastings algorithm. Second, reversible-jump Markov chain Monte Carlo (RJMCMC) method is adopted to calculate the posterior probabilities for ARMA and TARMA models: Posterior evidence in favor of TARMA models indicates threshold nonlinearity. Finally, based on RJMCMC scheme and Akaike information criterion (AIC) or Bayesian information criterion (BIC), the procedure for modeling TARMA models is exploited. Simulation experiments and a real data example show that our method works well for distinguishing an ARMA from a TARMA model and for building TARMA models.  相似文献   

12.
In this paper we deal with a Bayesian analysis for right-censored survival data suitable for populations with a cure rate. We consider a cure rate model based on the negative binomial distribution, encompassing as a special case the promotion time cure model. Bayesian analysis is based on Markov chain Monte Carlo (MCMC) methods. We also present some discussion on model selection and an illustration with a real data set.  相似文献   

13.
Traditional Item Response Theory models assume the distribution of the abilities of the population in study to be Gaussian. However, this may not always be a reasonable assumption, which motivates the development of more general models. This paper presents a generalized approach for the distribution of the abilities in dichotomous 3-parameter Item Response models. A mixture of normal distributions is considered, allowing for features like skewness, multimodality and heavy tails. A solution is proposed to deal with model identifiability issues without compromising the flexibility and practical interpretation of the model. Inference is carried out under the Bayesian Paradigm through a novel MCMC algorithm. The algorithm is designed in a way to favour good mixing and convergence properties and is also suitable for inference in traditional IRT models. The efficiency and applicability of our methodology is illustrated in simulated and real examples.  相似文献   

14.
Survival data involving silent events are often subject to interval censoring (the event is known to occur within a time interval) and classification errors if a test with no perfect sensitivity and specificity is applied. Considering the nature of this data plays an important role in estimating the time distribution until the occurrence of the event. In this context, we incorporate validation subsets into the parametric proportional hazard model, and show that this additional data, combined with Bayesian inference, compensate the lack of knowledge about test sensitivity and specificity improving the parameter estimates. The proposed model is evaluated through simulation studies, and Bayesian analysis is conducted within a Gibbs sampling procedure. The posterior estimates obtained under validation subset models present lower bias and standard deviation compared to the scenario with no validation subset or the model that assumes perfect sensitivity and specificity. Finally, we illustrate the usefulness of the new methodology with an analysis of real data about HIV acquisition in female sex workers that have been discussed in the literature.  相似文献   

15.
Just as frequentist hypothesis tests have been developed to check model assumptions, prior predictive p-values and other Bayesian p-values check prior distributions as well as other model assumptions. These model checks not only suffer from the usual threshold dependence of p-values, but also from the suppression of model uncertainty in subsequent inference. One solution is to transform Bayesian and frequentist p-values for model assessment into a fiducial distribution across the models. Averaging the Bayesian or frequentist posterior distributions with respect to the fiducial distribution can reproduce results from Bayesian model averaging or classical fiducial inference.  相似文献   

16.
17.
Many applications of statistical methods for data that are spatially correlated require the researcher to specify the correlation structure of the data. This can be a difficult task as there are many candidate structures. Some spatial correlation structures depend on the distance between the observed data points while others rely on neighborhood structures. In this paper, Bayesian methods that systematically determine the ‘best’ correlation structure from a predefined class of structures are proposed. Bayes factors, Highest Probability Models, and Bayesian Model Averaging are employed to determine the ‘best’ correlation structure and to average across these structures to create a non-parametric alternative structure for a loblolly pine data-set with known tree coordinates. Tree diameters and heights were measured and an investigation into the spatial dependence between the trees was conducted. Results showed that the most probable model for the spatial correlation structure agreed with allometric trends for loblolly pine. A combined Matern, simultaneous autoregressive model and conditional autoregressive model best described the inter-tree competition among the loblolly pine tree data considered in this research.  相似文献   

18.
This paper provides a Bayesian estimation procedure for monotone regression models incorporating the monotone trend constraint subject to uncertainty. For monotone regression modeling with stochastic restrictions, we propose a Bayesian Bernstein polynomial regression model using two-stage hierarchical prior distributions based on a family of rectangle-screened multivariate Gaussian distributions extended from the work of Gurtis and Ghosh [7 S.M. Curtis and S.K. Ghosh, A variable selection approach to monotonic regression with Bernstein polynomials, J. Appl. Stat. 38 (2011), pp. 961976. doi: 10.1080/02664761003692423[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]]. This approach reflects the uncertainty about the prior constraint, and thus proposes a regression model subject to monotone restriction with uncertainty. Based on the proposed model, we derive the posterior distributions for unknown parameters and present numerical schemes to generate posterior samples. We show the empirical performance of the proposed model based on synthetic data and real data applications and compare the performance to the Bernstein polynomial regression model of Curtis and Ghosh [7 S.M. Curtis and S.K. Ghosh, A variable selection approach to monotonic regression with Bernstein polynomials, J. Appl. Stat. 38 (2011), pp. 961976. doi: 10.1080/02664761003692423[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]] for the shape restriction with certainty. We illustrate the effectiveness of our proposed method that incorporates the uncertainty of the monotone trend and automatically adapts the regression function to the monotonicity, through empirical analysis with synthetic data and real data applications.  相似文献   

19.
We discuss a Bayesian formalism which gives rise to a type of wavelet threshold estimation in nonparametric regression. A prior distribution is imposed on the wavelet coefficients of the unknown response function, designed to capture the sparseness of wavelet expansion that is common to most applications. For the prior specified, the posterior median yields a thresholding procedure. Our prior model for the underlying function can be adjusted to give functions falling in any specific Besov space. We establish a relationship between the hyperparameters of the prior model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relationship gives insight into the meaning of the Besov space parameters. Moreover, the relationship established makes it possible in principle to incorporate prior knowledge about the function's regularity properties into the prior model for its wavelet coefficients. However, prior knowledge about a function's regularity properties might be difficult to elicit; with this in mind, we propose a standard choice of prior hyperparameters that works well in our examples. Several simulated examples are used to illustrate our method, and comparisons are made with other thresholding methods. We also present an application to a data set that was collected in an anaesthesiological study.  相似文献   

20.
We propose a new iterative algorithm, called model walking algorithm, to the Bayesian model averaging method on the longitudinal regression models with AR(1) random errors within subjects. The Markov chain Monte Carlo method together with the model walking algorithm are employed. The proposed method is successfully applied to predict the progression rates on a myopia intervention trial in children.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号