首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for strokes, we apply Bayesian model averaging to the selection of variables in Cox proportional hazard models. We use an extension of the leaps-and-bounds algorithm for locating the models that are to be averaged over and make available S-PLUS software to implement the methods. Bayesian model averaging provides a posterior probability that each variable belongs in the model, a more directly interpretable measure of variable importance than a P -value. P -values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable and do not account for model uncertainty. We introduce the partial predictive score to evaluate predictive performance. For the Cardiovascular Health Study, Bayesian model averaging predictively outperforms standard model selection and does a better job of assessing who is at high risk for a stroke.  相似文献   

2.
Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.  相似文献   

3.
While most regression models focus on explaining distributional aspects of one single response variable alone, interest in modern statistical applications has recently shifted towards simultaneously studying multiple response variables as well as their dependence structure. A particularly useful tool for pursuing such an analysis are copula-based regression models since they enable the separation of the marginal response distributions and the dependence structure summarised in a specific copula model. However, so far copula-based regression models have mostly been relying on two-step approaches where the marginal distributions are determined first whereas the copula structure is studied in a second step after plugging in the estimated marginal distributions. Moreover, the parameters of the copula are mostly treated as a constant not related to covariates and most regression specifications for the marginals are restricted to purely linear predictors. We therefore propose simultaneous Bayesian inference for both the marginal distributions and the copula using computationally efficient Markov chain Monte Carlo simulation techniques. In addition, we replace the commonly used linear predictor by a generic structured additive predictor comprising for example nonlinear effects of continuous covariates, spatial effects or random effects and furthermore allow to make the copula parameters covariate-dependent. To facilitate Bayesian inference, we construct proposal densities for a Metropolis–Hastings algorithm relying on quadratic approximations to the full conditionals of regression coefficients avoiding manual tuning. The performance of the resulting Bayesian estimates is evaluated in simulations comparing our approach with penalised likelihood inference, studying the choice of a specific copula model based on the deviance information criterion, and comparing a simultaneous approach with a two-step procedure. Furthermore, the flexibility of Bayesian conditional copula regression models is illustrated in two applications on childhood undernutrition and macroecology.  相似文献   

4.
The article considers a Gaussian model with the mean and the variance modeled flexibly as functions of the independent variables. The estimation is carried out using a Bayesian approach that allows the identification of significant variables in the variance function, as well as averaging over all possible models in both the mean and the variance functions. The computation is carried out by a simulation method that is carefully constructed to ensure that it converges quickly and produces iterates from the posterior distribution that have low correlation. Real and simulated examples demonstrate that the proposed method works well. The method in this paper is important because (a) it produces more realistic prediction intervals than nonparametric regression estimators that assume a constant variance; (b) variable selection identifies the variables in the variance function that are important; (c) variable selection and model averaging produce more efficient prediction intervals than those obtained by regular nonparametric regression.  相似文献   

5.
Modelling of HIV dynamics in AIDS research has greatly improved our understanding of the pathogenesis of HIV-1 infection and guided for the treatment of AIDS patients and evaluation of antiretroviral therapies. Some of the model parameters may have practical meanings with prior knowledge available, but others might not have prior knowledge. Incorporating priors can improve the statistical inference. Although there have been extensive Bayesian and frequentist estimation methods for the viral dynamic models, little work has been done on making simultaneous inference about the Bayesian and frequentist parameters. In this article, we propose a hybrid Bayesian inference approach for viral dynamic nonlinear mixed-effects models using the Bayesian frequentist hybrid theory developed in Yuan [Bayesian frequentist hybrid inference, Ann. Statist. 37 (2009), pp. 2458–2501]. Compared with frequentist inference in a real example and two simulation examples, the hybrid Bayesian approach is able to improve the inference accuracy without compromising the computational load.  相似文献   

6.
In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods.  相似文献   

7.
This paper is concerned with model averaging procedure for varying-coefficient partially linear models with missing responses. The profile least-squares estimation process and inverse probability weighted method are employed to estimate regression coefficients of the partially restricted models, in which the propensity score is estimated by the covariate balancing propensity score method. The estimators of the linear parameters are shown to be asymptotically normal. Then we develop the focused information criterion, formulate the frequentist model averaging estimators and construct the corresponding confidence intervals. Some simulation studies are conducted to examine the finite sample performance of the proposed methods. We find that the covariate balancing propensity score improves the performance of the inverse probability weighted estimator. We also demonstrate the superiority of the proposed model averaging estimators over those of existing strategies in terms of mean squared error and coverage probability. Finally, our approach is further applied to a real data example.  相似文献   

8.
Just as frequentist hypothesis tests have been developed to check model assumptions, prior predictive p-values and other Bayesian p-values check prior distributions as well as other model assumptions. These model checks not only suffer from the usual threshold dependence of p-values, but also from the suppression of model uncertainty in subsequent inference. One solution is to transform Bayesian and frequentist p-values for model assessment into a fiducial distribution across the models. Averaging the Bayesian or frequentist posterior distributions with respect to the fiducial distribution can reproduce results from Bayesian model averaging or classical fiducial inference.  相似文献   

9.
Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.  相似文献   

10.
We study the focused information criterion and frequentist model averaging and their application to post‐model‐selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non‐parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR‐estimator of a focused parameter is a non‐linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non‐normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure.  相似文献   

11.
Abstract.  An optimal Bayesian decision procedure for testing hypothesis in normal linear models based on intrinsic model posterior probabilities is considered. It is proven that these posterior probabilities are simple functions of the classical F -statistic, thus the evaluation of the procedure can be carried out analytically through the frequentist analysis of the posterior probability of the null. An asymptotic analysis proves that, under mild conditions on the design matrix, the procedure is consistent. For any testing hypothesis it is also seen that there is a one-to-one mapping – which we call calibration curve – between the posterior probability of the null hypothesis and the classical bi p -value. This curve adds substantial knowledge about the possible discrepancies between the Bayesian and the p -value measures of evidence for testing hypothesis. It permits a better understanding of the serious difficulties that are encountered in linear models for interpreting the p -values. A specific illustration of the variable selection problem is given.  相似文献   

12.
Various statistical models have been proposed for two‐dimensional dose finding in drug‐combination trials. However, it is often a dilemma to decide which model to use when conducting a particular drug‐combination trial. We make a comprehensive comparison of four dose‐finding methods, and for fairness, we apply the same dose‐finding algorithm under the four model structures. Through extensive simulation studies, we compare the operating characteristics of these methods in various practical scenarios. The results show that different models may lead to different design properties and that no single model performs uniformly better in all scenarios. As a result, we propose using Bayesian model averaging to overcome the arbitrariness of the model specification and enhance the robustness of the design. We assign a discrete probability mass to each model as the prior model probability and then estimate the toxicity probabilities of combined doses in the Bayesian model averaging framework. During the trial, we adaptively allocated each new cohort of patients to the most appropriate dose combination by comparing the posterior estimates of the toxicity probabilities with the prespecified toxicity target. The simulation results demonstrate that the Bayesian model averaging approach is robust under various scenarios. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

13.
ABSTRACT

This paper considers posterior consistency in the context of high-dimensional variable selection using the Bayesian lasso algorithm. In a frequentist setting, consistency is perhaps the most basic property that we expect any reasonable estimator to achieve. However, in a Bayesian setting, consistency is often ignored or taken for granted, especially in more complex hierarchical Bayesian models. In this paper, we have derived sufficient conditions for posterior consistency in the Bayesian lasso model with the orthogonal design, where the number of parameters grows with the sample size.  相似文献   

14.
This paper surveys various shrinkage, smoothing and selection priors from a unifying perspective and shows how to combine them for Bayesian regularisation in the general class of structured additive regression models. As a common feature, all regularisation priors are conditionally Gaussian, given further parameters regularising model complexity. Hyperpriors for these parameters encourage shrinkage, smoothness or selection. It is shown that these regularisation (log-) priors can be interpreted as Bayesian analogues of several well-known frequentist penalty terms. Inference can be carried out with unified and computationally efficient MCMC schemes, estimating regularised regression coefficients and basis function coefficients simultaneously with complexity parameters and measuring uncertainty via corresponding marginal posteriors. For variable and function selection we discuss several variants of spike and slab priors which can also be cast into the framework of conditionally Gaussian priors. The performance of the Bayesian regularisation approaches is demonstrated in a hazard regression model and a high-dimensional geoadditive regression model.  相似文献   

15.
This article studies a general joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the competing risks survival data, and a regression sub-model for the variance–covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. The model provides a useful approach to adjust for non-ignorable missing data due to dropout for the longitudinal outcome, enables analysis of the survival outcome with informative censoring and intermittently measured time-dependent covariates, as well as joint analysis of the longitudinal and survival outcomes. Unlike previously studied joint models, our model allows for heterogeneous random covariance matrices. It also offers a framework to assess the homogeneous covariance assumption of existing joint models. A Bayesian MCMC procedure is developed for parameter estimation and inference. Its performances and frequentist properties are investigated using simulations. A real data example is used to illustrate the usefulness of the approach.  相似文献   

16.
To capture mean and variance asymmetries and time‐varying volatility in financial time series, we generalize the threshold stochastic volatility (THSV) model and incorporate a heavy‐tailed error distribution. Unlike existing stochastic volatility models, this model simultaneously accounts for uncertainty in the unobserved threshold value and in the time‐delay parameter. Self‐exciting and exogenous threshold variables are considered to investigate the impact of a number of market news variables on volatility changes. Adopting a Bayesian approach, we use Markov chain Monte Carlo methods to estimate all unknown parameters and latent variables. A simulation experiment demonstrates good estimation performance for reasonable sample sizes. In a study of two international financial market indices, we consider two variants of the generalized THSV model, with US market news as the threshold variable. Finally, we compare models using Bayesian forecasting in a value‐at‐risk (VaR) study. The results show that our proposed model can generate more accurate VaR forecasts than can standard models.  相似文献   

17.
Biomarkers have the potential to improve our understanding of disease diagnosis and prognosis. Biomarker levels that fall below the assay detection limits (DLs), however, compromise the application of biomarkers in research and practice. Most existing methods to handle non-detects focus on a scenario in which the response variable is subject to the DL; only a few methods consider explanatory variables when dealing with DLs. We propose a Bayesian approach for generalized linear models with explanatory variables subject to lower, upper, or interval DLs. In simulation studies, we compared the proposed Bayesian approach to four commonly used methods in a logistic regression model with explanatory variable measurements subject to the DL. We also applied the Bayesian approach and other four methods in a real study, in which a panel of cytokine biomarkers was studied for their association with acute lung injury (ALI). We found that IL8 was associated with a moderate increase in risk for ALI in the model based on the proposed Bayesian approach.  相似文献   

18.
Summary.  Existing Bayesian model selection procedures require the specification of prior distributions on the parameters appearing in every model in the selection set. In practice, this requirement limits the application of Bayesian model selection methodology. To overcome this limitation, we propose a new approach towards Bayesian model selection that uses classical test statistics to compute Bayes factors between possible models. In several test cases, our approach produces results that are similar to previously proposed Bayesian model selection and model averaging techniques in which prior distributions were carefully chosen. In addition to eliminating the requirement to specify complicated prior distributions, this method offers important computational and algorithmic advantages over existing simulation-based methods. Because it is easy to evaluate the operating characteristics of this procedure for a given sample size and specified number of covariates, our method facilitates the selection of hyperparameter values through prior-predictive simulation.  相似文献   

19.
The multinomial logistic regression model (MLRM) can be interpreted as a natural extension of the binomial model with logit link function to situations where the response variable can have three or more possible outcomes. In addition, when the categories of the response variable are nominal, the MLRM can be expressed in terms of two or more logistic models and analyzed in both frequentist and Bayesian approaches. However, few discussions about post modeling in categorical data models are found in the literature, and they mainly use Bayesian inference. The objective of this work is to present classic and Bayesian diagnostic measures for categorical data models. These measures are applied to a dataset (status) of patients undergoing kidney transplantation.  相似文献   

20.
In this article, a generalized linear mixed model (GLMM) based on a frequentist approach is employed to examine spatial trend of asthma data. However, the frequentist analysis of GLMM is computationally difficult. On the other hand, the Bayesian analysis of GLMM has been computationally convenient due to the advent of Markov chain Monte Carlo algorithms. Recently developed data cloning (DC) method, which yields to maximum likelihood estimate, provides frequentist approach to complex mixed models and equally computationally convenient method. We use DC to conduct frequentist analysis of spatial models. The advantages of the DC approach are that the answers are independent of the choice of the priors, non-estimable parameters are flagged automatically, and the possibility of improper posterior distributions is completely avoided. We illustrate this approach using a real dataset of asthma visits to hospital in the province of Manitoba, Canada, during 2000–2010. The performance of the DC approach in our application is also studied through a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号