首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Bayesian elastic net approach is presented for variable selection and coefficient estimation in linear regression models. A simple Gibbs sampling algorithm was developed for posterior inference using a location-scale mixture representation of the Bayesian elastic net prior for the regression coefficients. The penalty parameters are chosen through an empirical method that maximizes the data marginal likelihood. Both simulated and real data examples show that the proposed method performs well in comparison to the other approaches.  相似文献   

2.
Regression plays a central role in the discipline of statistics and is the primary analytic technique in many research areas. Variable selection is a classical and major problem for regression. This article emphasizes the economic aspect of variable selection. The problem is formulated in terms of the cost of predictors to be purchased for future use: only the subset of covariates used in the model will need to be purchased. This leads to a decision-theoretic formulation of the variable selection problems, which includes the cost of predictors as well as their effect. We adopt a Bayesian perspective and propose two approaches to address uncertainty about the model and model parameters. These approaches, termed the restricted and extended approaches, lead us to rethink model averaging. From an objective or robust Bayes point of view, the former is preferred. The proposed method is applied to three popular datasets for illustration.  相似文献   

3.
The lasso is a popular technique of simultaneous estimation and variable selection in many research areas. The marginal posterior mode of the regression coefficients is equivalent to estimates given by the non-Bayesian lasso when the regression coefficients have independent Laplace priors. Because of its flexibility of statistical inferences, the Bayesian approach is attracting a growing body of research in recent years. Current approaches are primarily to either do a fully Bayesian analysis using Markov chain Monte Carlo (MCMC) algorithm or use Monte Carlo expectation maximization (MCEM) methods with an MCMC algorithm in each E-step. However, MCMC-based Bayesian method has much computational burden and slow convergence. Tan et al. [An efficient MCEM algorithm for fitting generalized linear mixed models for correlated binary data. J Stat Comput Simul. 2007;77:929–943] proposed a non-iterative sampling approach, the inverse Bayes formula (IBF) sampler, for computing posteriors of a hierarchical model in the structure of MCEM. Motivated by their paper, we develop this IBF sampler in the structure of MCEM to give the marginal posterior mode of the regression coefficients for the Bayesian lasso, by adjusting the weights of importance sampling, when the full conditional distribution is not explicit. Simulation experiments show that the computational time is much reduced with our method based on the expectation maximization algorithm and our algorithms and our methods behave comparably with other Bayesian lasso methods not only in prediction accuracy but also in variable selection accuracy and even better especially when the sample size is relatively large.  相似文献   

4.
Regularization methods for simultaneous variable selection and coefficient estimation have been shown to be effective in quantile regression in improving the prediction accuracy. In this article, we propose the Bayesian bridge for variable selection and coefficient estimation in quantile regression. A simple and efficient Gibbs sampling algorithm was developed for posterior inference using a scale mixture of uniform representation of the Bayesian bridge prior. This is the first work to discuss regularized quantile regression with the bridge penalty. Both simulated and real data examples show that the proposed method often outperforms quantile regression without regularization, lasso quantile regression, and Bayesian lasso quantile regression.  相似文献   

5.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.  相似文献   

6.
This paper presents a Bayesian analysis of partially linear additive models for quantile regression. We develop a semiparametric Bayesian approach to quantile regression models using a spectral representation of the nonparametric regression functions and the Dirichlet process (DP) mixture for error distribution. We also consider Bayesian variable selection procedures for both parametric and nonparametric components in a partially linear additive model structure based on the Bayesian shrinkage priors via a stochastic search algorithm. Based on the proposed Bayesian semiparametric additive quantile regression model referred to as BSAQ, the Bayesian inference is considered for estimation and model selection. For the posterior computation, we design a simple and efficient Gibbs sampler based on a location-scale mixture of exponential and normal distributions for an asymmetric Laplace distribution, which facilitates the commonly used collapsed Gibbs sampling algorithms for the DP mixture models. Additionally, we discuss the asymptotic property of the sempiparametric quantile regression model in terms of consistency of posterior distribution. Simulation studies and real data application examples illustrate the proposed method and compare it with Bayesian quantile regression methods in the literature.  相似文献   

7.
Summary.  Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g. the lasso and smoothly clipped absolute deviation) are found to be particularly useful for variable selection. Nevertheless, the desirable performances of those shrinkage methods heavily hinge on an appropriate selection of the tuning parameters. With a fixed predictor dimension, Wang and co-worker have demonstrated that the tuning parameters selected by a Bayesian information criterion type criterion can identify the true model consistently. In this work, similar results are further extended to the situation with a diverging number of parameters for both unpenalized and penalized estimators. Consequently, our theoretical results further enlarge not only the scope of applicabilityation criterion type criteria but also that of those shrinkage estimation methods.  相似文献   

8.
In this paper, we develop Bayesian methodology and computational algorithms for variable subset selection in Cox proportional hazards models with missing covariate data. A new joint semi-conjugate prior for the piecewise exponential model is proposed in the presence of missing covariates and its properties are examined. The covariates are assumed to be missing at random (MAR). Under this new prior, a version of the Deviance Information Criterion (DIC) is proposed for Bayesian variable subset selection in the presence of missing covariates. Monte Carlo methods are developed for computing the DICs for all possible subset models in the model space. A Bone Marrow Transplant (BMT) dataset is used to illustrate the proposed methodology.  相似文献   

9.
This paper surveys various shrinkage, smoothing and selection priors from a unifying perspective and shows how to combine them for Bayesian regularisation in the general class of structured additive regression models. As a common feature, all regularisation priors are conditionally Gaussian, given further parameters regularising model complexity. Hyperpriors for these parameters encourage shrinkage, smoothness or selection. It is shown that these regularisation (log-) priors can be interpreted as Bayesian analogues of several well-known frequentist penalty terms. Inference can be carried out with unified and computationally efficient MCMC schemes, estimating regularised regression coefficients and basis function coefficients simultaneously with complexity parameters and measuring uncertainty via corresponding marginal posteriors. For variable and function selection we discuss several variants of spike and slab priors which can also be cast into the framework of conditionally Gaussian priors. The performance of the Bayesian regularisation approaches is demonstrated in a hazard regression model and a high-dimensional geoadditive regression model.  相似文献   

10.
In this article, we develop a Bayesian variable selection method that concerns selection of covariates in the Poisson change-point regression model with both discrete and continuous candidate covariates. Ranging from a null model with no selected covariates to a full model including all covariates, the Bayesian variable selection method searches the entire model space, estimates posterior inclusion probabilities of covariates, and obtains model averaged estimates on coefficients to covariates, while simultaneously estimating a time-varying baseline rate due to change-points. For posterior computation, the Metropolis-Hastings within partially collapsed Gibbs sampler is developed to efficiently fit the Poisson change-point regression model with variable selection. We illustrate the proposed method using simulated and real datasets.  相似文献   

11.
In this paper, we consider a special finite mixture model named Combination of Uniform and shifted Binomial (CUB), recently introduced in the statistical literature to analyse ordinal data expressing the preferences of raters with regards to items or services. Our aim is to develop a variable selection procedure for this model using a Bayesian approach. Bayesian methods for variable selection and model choice have become increasingly popular in recent years, due to advances in Markov chain Monte Carlo computational algorithms. Several methods have been proposed in the case of linear and generalized linear models (GLM). In this paper, we adapt to the CUB model some of these algorithms: the Kuo–Mallick method together with its ‘metropolized’ version and the Stochastic Search Variable Selection method. Several simulated examples are used to illustrate the algorithms and to compare their performance. Finally, an application to real data is introduced.  相似文献   

12.
One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set.  相似文献   

13.
Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.  相似文献   

14.
The important feature of the accelerated hazards (AH) model is that it can capture the gradual effect of treatment. Because of the complexity in its estimation, few discussion has been made on the variable selection of the AH model. The Bayesian non-parametric prior, called the transformed Bernstein polynomial prior, is employed for simultaneously robust estimation and variable selection in sparse AH models. We first introduce a naive lasso-type accelerated hazards model, and later, in order to reduce estimation bias and improve variable selection accuracy, we further consider an adaptive lasso AH model as a direct extension of the naive lasso-type model. Through our simulation studies, we obtain that the adaptive lasso AH model performs better than the lasso-type model with respect to the variable selection and prediction accuracy. We also illustrate the performance of the proposed methods via a brain tumour study.  相似文献   

15.
This paper demonstrates that cross-validation (CV) and Bayesian adaptive bandwidth selection can be applied in the estimation of associated kernel discrete functions. This idea is originally proposed by Brewer [A Bayesian model for local smoothing in kernel density estimation, Stat. Comput. 10 (2000), pp. 299–309] to derive variable bandwidths in adaptive kernel density estimation. Our approach considers the adaptive binomial kernel estimator and treats the variable bandwidths as parameters with beta prior distribution. The best variable bandwidth selector is estimated by the posterior mean in the Bayesian sense under squared error loss. Monte Carlo simulations are conducted to examine the performance of the proposed Bayesian adaptive approach in comparison with the performance of the Asymptotic mean integrated squared error estimator and CV technique for selecting a global (fixed) bandwidth proposed in Kokonendji and Senga Kiessé [Discrete associated kernels method and extensions, Stat. Methodol. 8 (2011), pp. 497–516]. The Bayesian adaptive bandwidth estimator performs better than the global bandwidth, in particular for small and moderate sample sizes.  相似文献   

16.
Recent approaches to the statistical analysis of adverse event (AE) data in clinical trials have proposed the use of groupings of related AEs, such as by system organ class (SOC). These methods have opened up the possibility of scanning large numbers of AEs while controlling for multiple comparisons, making the comparative performance of the different methods in terms of AE detection and error rates of interest to investigators. We apply two Bayesian models and two procedures for controlling the false discovery rate (FDR), which use groupings of AEs, to real clinical trial safety data. We find that while the Bayesian models are appropriate for the full data set, the error controlling methods only give similar results to the Bayesian methods when low incidence AEs are removed. A simulation study is used to compare the relative performances of the methods. We investigate the differences between the methods over full trial data sets, and over data sets with low incidence AEs and SOCs removed. We find that while the removal of low incidence AEs increases the power of the error controlling procedures, the estimated power of the Bayesian methods remains relatively constant over all data sizes. Automatic removal of low-incidence AEs however does have an effect on the error rates of all the methods, and a clinically guided approach to their removal is needed. Overall we found that the Bayesian approaches are particularly useful for scanning the large amounts of AE data gathered.  相似文献   

17.
The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.  相似文献   

18.
In this article, the problem of parameter estimation and variable selection in the Tobit quantile regression model is considered. A Tobit quantile regression with the elastic net penalty from a Bayesian perspective is proposed. Independent gamma priors are put on the l1 norm penalty parameters. A novel aspect of the Bayesian elastic net Tobit quantile regression is to treat the hyperparameters of the gamma priors as unknowns and let the data estimate them along with other parameters. A Bayesian Tobit quantile regression with the adaptive elastic net penalty is also proposed. The Gibbs sampling computational technique is adapted to simulate the parameters from the posterior distributions. The proposed methods are demonstrated by both simulated and real data examples.  相似文献   

19.
面板数据的自适应Lasso分位回归方法研究   总被引:1,自引:0,他引:1  
如何在对参数进行估计的同时自动选择重要解释变量,一直是面板数据分位回归模型中讨论的热点问题之一。通过构造一种含多重随机效应的贝叶斯分层分位回归模型,在假定固定效应系数先验服从一种新的条件Laplace分布的基础上,给出了模型参数估计的Gibbs抽样算法。考虑到不同重要程度的解释变量权重系数压缩程度应该不同,所构造的先验信息具有自适应性的特点,能够准确地对模型中重要解释变量进行自动选取,且设计的切片Gibbs抽样算法能够快速有效地解决模型中各个参数的后验均值估计问题。模拟结果显示,新方法在参数估计精确度和变量选择准确度上均优于现有文献的常用方法。通过对中国各地区多个宏观经济指标的面板数据进行建模分析,演示了新方法估计参数与挑选变量的能力。  相似文献   

20.
Linear models with a growing number of parameters have been widely used in modern statistics. One important problem about this kind of model is the variable selection issue. Bayesian approaches, which provide a stochastic search of informative variables, have gained popularity. In this paper, we will study the asymptotic properties related to Bayesian model selection when the model dimension p is growing with the sample size n. We consider pn and provide sufficient conditions under which: (1) with large probability, the posterior probability of the true model (from which samples are drawn) uniformly dominates the posterior probability of any incorrect models; and (2) the posterior probability of the true model converges to one in probability. Both (1) and (2) guarantee that the true model will be selected under a Bayesian framework. We also demonstrate several situations when (1) holds but (2) fails, which illustrates the difference between these two properties. Finally, we generalize our results to include g-priors, and provide simulation examples to illustrate the main results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号