首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.  相似文献   

2.
We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selection suffers from computational feasibility issues. More precisely, there is often a high cost associated with computing maximum likelihood estimates (MLE) for all subsets of GLMs. Efficient algorithms for exhaustive searches exist for linear models, most notably the leaps‐and‐bound algorithm and, more recently, the mixed integer optimisation (MIO) algorithm. The APES method learns from observational weights in a generalised linear regression super‐model and reformulates the GLM problem as a linear regression problem. In this way, APES can approximate a true exhaustive search in the original GLM space. Where exhaustive variable selection is not computationally feasible, we propose a best‐subset search, which also closely approximates a true exhaustive search. APES is made available in both as a standalone R package as well as part of the already existing mplot package.  相似文献   

3.
In this paper, we propose a novel Max-Relevance and Min-Common-Redundancy criterion for variable selection in linear models. Considering that the ensemble approach for variable selection has been proven to be quite effective in linear regression models, we construct a variable selection ensemble (VSE) by combining the presented stochastic correlation coefficient algorithm with a stochastic stepwise algorithm. We conduct extensive experimental comparison of our algorithm and other methods using two simulation studies and four real-life data sets. The results confirm that the proposed VSE leads to promising improvement on variable selection and regression accuracy.  相似文献   

4.
Increased transcranial Doppler ultrasound (TCD) velocity is an indicator of cerebral infarction in children with sickle cell disease (SCD). In this article, the parallel genetic algorithm (PGA) is used to select a stroke risk model with TCD velocity as the response variable. Development of such a stroke risk model leads to the identification of children with SCD who are at a higher risk of stroke and their treatment in the early stages. Using blood velocity data from SCD patients, it is shown that the PGA is an easy-to-use computationally variable selection tool. The results of the PGA are also compared with those obtained from the stochastic search variable selection method, the Dantzig selector and conventional techniques such as stepwise selection and best subset selection.  相似文献   

5.
Introducing model uncertainty by moving blocks bootstrap   总被引:1,自引:1,他引:0  
It is common in parametric bootstrap to select the model from the data, and then treat as if it were the true model. Chatfield (1993, 1996) has shown that ignoring the model uncertainty may seriously undermine the coverage accuracy of prediction intervals. In this paper, we propose a method based on moving block bootstrap for introducing the model selection step in the resampling algorithm. We present a Monte Carlo study comparing the finite sample properties of the proposel method with those of alternative methods in the case of prediction intervas.  相似文献   

6.
Partial linear varying coefficient models are often used in real data analysis for a good balance between flexibility and parsimony. In this paper, we propose a robust adaptive model selection method based on the rank regression, which can do simultaneous coefficient estimation and three types of selections, i.e., varying and constant effects selection, relevant variable selection. The new method has superiority in robustness and efficiency by inheriting the advantage of the rank regression approach. Furthermore, consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies also confirm our method.  相似文献   

7.
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset.  相似文献   

8.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

9.

This paper is motivated by our collaborative research and the aim is to model clinical assessments of upper limb function after stroke using 3D-position and 4D-orientation movement data. We present a new nonlinear mixed-effects scalar-on-function regression model with a Gaussian process prior focusing on the variable selection from a large number of candidates including both scalar and function variables. A novel variable selection algorithm has been developed, namely functional least angle regression. As it is essential for this algorithm, we studied the representation of functional variables with different methods and the correlation between a scalar and a group of mixed scalar and functional variables. We also propose a new stopping rule for practical use. This algorithm is efficient and accurate for both variable selection and parameter estimation even when the number of functional variables is very large and the variables are correlated. And thus the prediction provided by the algorithm is accurate. Our comprehensive simulation study showed that the method is superior to other existing variable selection methods. When the algorithm was applied to the analysis of the movement data, the use of the nonlinear random-effect model and the function variables significantly improved the prediction accuracy for the clinical assessment.

  相似文献   

10.
We propose a Bayesian stochastic search approach to selecting restrictions on multivariate regression models where the errors exhibit deterministic or stochastic conditional volatilities. We develop a Markov chain Monte Carlo (MCMC) algorithm that generates posterior restrictions on the regression coefficients and Cholesky decompositions of the covariance matrix of the errors. Numerical simulations with artificially generated data show that the proposed method is effective in selecting the data-generating model restrictions and improving the forecasting performance of the model. Applying the method to daily foreign exchange rate data, we conduct stochastic search on a VAR model with stochastic conditional volatilities.  相似文献   

11.
A harmonic new better than used in expectation (HNBUE) variable is a random variable which is dominated by an exponential distribution in the convex stochastic order. We use a recently obtained condition on stochastic equality under convex domination to derive characterizations of the exponential distribution and bounds for HNBUE variables based on the mean values of the order statistics of the variable. We apply the results to generate discrepancy measures to test if a random variable is exponential against the alternative that is HNBUE, but not exponential.  相似文献   

12.
We propose a new algorithm for simultaneous variable selection and parameter estimation for the single-index quantile regression (SIQR) model . The proposed algorithm, which is non iterative , consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and , using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method consistently estimates the relevant variables , and the estimated parametric component derived in Step 2 satisfies the oracle property.  相似文献   

13.
It is frequently the case that a response will be related to both a vector of finite length and a function-valued random variable as predictor variables. In this paper, we propose new estimators for the parameters of a partial functional linear model which explores the relationship between a scalar response variable and mixed-type predictors. Asymptotic properties of the proposed estimators are established and finite sample behavior is studied through a small simulation experiment.  相似文献   

14.
Hea-Jung Kim  Taeyoung Roh 《Statistics》2013,47(5):1082-1111
In regression analysis, a sample selection scheme often applies to the response variable, which results in missing not at random observations on the variable. In this case, a regression analysis using only the selected cases would lead to biased results. This paper proposes a Bayesian methodology to correct this bias based on a semiparametric Bernstein polynomial regression model that incorporates the sample selection scheme into a stochastic monotone trend constraint, variable selection, and robustness against departures from the normality assumption. We present the basic theoretical properties of the proposed model that include its stochastic representation, sample selection bias quantification, and hierarchical model specification to deal with the stochastic monotone trend constraint in the nonparametric component, simple bias corrected estimation, and variable selection for the linear components. We then develop computationally feasible Markov chain Monte Carlo methods for semiparametric Bernstein polynomial functions with stochastically constrained parameter estimation and variable selection procedures. We demonstrate the finite-sample performance of the proposed model compared to existing methods using simulation studies and illustrate its use based on two real data applications.  相似文献   

15.
In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoothly Clipped Absolute Deviation (SCAD) penalty. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we propose distributed SCAD algorithm and prove its convergence. The results of variable selection of the distributed approach are same with the results of the non-distributed approach. Numerical studies show that our method is both effective and efficient which performs well in distributed data analysis.  相似文献   

16.
Nonlinear programming problem is the general case of mathematical programming problem such that both the objective and constraint functions are nonlinear and is the most difficult case of smooth optimization problem to solve. In this article, we suggest a stochastic search method to general nonlinear programming problems which is not an iterative algorithm but it is an interior point method. The proposed method finds the near-optimal solution to the problem. The results of a few numerical studies are reported. The efficiency of the new method is compared and is found to be reasonable.  相似文献   

17.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

18.
In high-dimensional setting, componentwise L2boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, SparseL2Boosting and Twin Boosting, have been proposed to improve the variable selection of L2boosting algorithm. In this article, we propose a new general sparse boosting method (GSBoosting). The relations are established between GSBoosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSBoosting has good performance in both prediction and variable selection.  相似文献   

19.
In this article, we propose a Bayesian approach to estimate the multiple structural change-points in a level and the trend when the number of change-points is unknown. Our formulation of the structural-change model involves a binary discrete variable that indicates the structural change. The determination of the number and the form of structural changes are considered as a model selection issue in Bayesian structural-change analysis. We apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo (SAMC) algorithm, to this structural-change model selection issue. SAMC effectively functions for the complex structural-change model estimation, since it prevents entrapment in local posterior mode. The estimation of the model parameters in each regime is made using the Gibbs sampler after each change-point is detected. The performance of our proposed method has been investigated on simulated and real data sets, a long time series of US real gross domestic product, US uses of force between 1870 and 1994 and 1-year time series of temperature in Seoul, South Korea.  相似文献   

20.
This article considers Bayesian variable selection problems for binary responses via stochastic search variable selection and Bayesian Lasso. To avoid matrix inversion in the corresponding Markov chain Monte Carlo implementations, the componentwise Gibbs sampler (CGS) idea is adopted. Moreover, we also propose automatic hyperparameter tuning rules for the proposed approaches. Simulation studies and a real example are used to demonstrate the performances of the proposed approaches. These results show that CGS approaches do not only have good performances in variable selection but also have the lower batch mean standard error values than those of original methods, especially for large number of covariates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号