首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe the use of perfect sampling algorithms for Bayesian variable selection in a linear regression model. Starting with a basic case solved by Huang and Djurić (EURASIP J. Appl. Si. Pr. 1 (2002) 38), where the model coefficients and noise variance are assumed to be known, we generalize the model step by step to allow for other sources of randomness. We specify perfect simulation algorithms that solve these cases by incorporating various techniques including Gibbs sampling, the perfect independent Metropolis–Hastings (IMH) algorithm, and recently developed “slice coupling” algorithms. Applications to simulated data sets suggest that our algorithms perform well in identifying relevant predictor variables.  相似文献   

2.
In a calibration of near-infrared (NIR) instrument, we regress some chemical compositions of interest as a function of their NIR spectra. In this process, we have two immediate challenges: first, the number of variables exceeds the number of observations and, second, the multicollinearity between variables are extremely high. To deal with the challenges, prediction models that produce sparse solutions have recently been proposed. The term ‘sparse’ means that some model parameters are zero estimated and the other parameters are estimated naturally away from zero. In effect, a variable selection is embedded in the model to potentially achieve a better prediction. Many studies have investigated sparse solutions for latent variable models, such as partial least squares and principal component regression, and for direct regression models such as ridge regression (RR). However, in the latter, it mainly involves an L1 norm penalty to the objective function such as lasso regression. In this study, we investigate new sparse alternative models for RR within a random effects model framework, where we consider Cauchy and mixture-of-normals distributions on the random effects. The results indicate that the mixture-of-normals model produces a sparse solution with good prediction and better interpretation. We illustrate the methods using NIR spectra datasets from milk and corn specimens.  相似文献   

3.
The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose a Bayesian generalized fused lasso modeling based on a normal-exponential-gamma (NEG) prior distribution. The NEG prior is assumed into the difference of successive regression coefficients. The proposed method enables us to construct a more versatile sparse model than the ordinary fused lasso using a flexible regularization term. Simulation studies and real data analyses show that the proposed method has superior performance to the ordinary fused lasso.  相似文献   

4.
Additive varying coefficient models are a natural extension of multiple linear regression models, allowing the regression coefficients to be functions of other variables. Therefore these models are more flexible to model more complex dependencies in data structures. In this paper we consider the problem of selecting in an automatic way the significant variables among a large set of variables, when the interest is on a given response variable. In recent years several grouped regularization methods have been proposed and in this paper we present these under one unified framework in this varying coefficient model context. For each of the discussed grouped regularization methods we investigate the optimization problem to be solved, possible algorithms for doing so, and the variable and estimation consistency of the methods. We investigate the finite-sample performance of these methods, in a comparative study, and illustrate them on real data examples.  相似文献   

5.
Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

6.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

7.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

8.
In this paper we study some problems associated with count data from a bivariate Poisson distribution, in which the marginal means are functions of explanatory variables. The estimates of these regression coefficients are developed under a variety of conditions: unrestricted linear model; parallelism of the regression planes; the coincidence of the regression planes. Tests are also developed for the validity of hypotheses involved in these models. The techniques are illustrated using simulated data.  相似文献   

9.
We consider a linear regression model when some independent variables are unobservable, but proxy variables are available instead of them. We derive the distribution and density functions of a pre-test estimator of the error variance after a pre-test for the null hypothesis that the coefficients for the unobservable variables are zeros. Based on the density function, we show that when the critical value of the pre-test is unity, the coverage probability in the interval estimation of the error variance is maximum.  相似文献   

10.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

11.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

12.
This paper considers the estimation of coefficients in a linear regression model with missing observations in the independent variables and introduces a modification of the standard first order regression method for imputation of missing values. The modification provides stochastic values for imputation and, as an extension, makes use of the principle of weighted mixed regression. The proposed procedures are compared with two popular procedures—one which utilizes only the complete observations and the other which employs the standard first order regression imputation method for missing values. A simulation experiment to evaluate the gain in efficiency and to examine interesting issues like the impact of varying degree of multicollinearity in explanatory variables is proceeded. Some work on the case of discrete regressor variables is in progress and will be reported in a future article to follow.  相似文献   

13.
Consider a linear regression model with some relevant regressors are unobservable. In such a situation, we estimate the model by using the proxy variables as regressors or by simply omitting the relevant regressors. In this paper, we derive the explicit formula of predictive mean squared error (PMSE) of a general family of shrinkage estimators of regression coefficients. It is shown analytically that the positive-part shrinkage estimator dominates the ordinary shrinkage estimator even when proxy variables are used in place of the unobserved variables. Also, as an example, our result is applied to the double k-class estimator proposed by Ullah and Ullah (Double k-class estimators of coefficients in linear regression. Econometrica. 1978;46:705–722). Our numerical results show that the positive-part double k-class estimator with proxy variables has preferable PMSE performance.  相似文献   

14.
Varying coefficient models are a useful statistical tool to explore dynamic patterns of a regression relationship, in which the variation features of the regression coefficients are taken as the main evidence to reflect the dynamic relationship between the response and the explanatory variables. In this study, we propose a SiZer approach as a visually diagnostic device to uncover the statistically significant features of the coefficients. This method can highlight the significant structures of the coefficients under different scales and can therefore extract relatively full information in the data. The simulation studies and real-world data analysis show that the SiZer approach performs satisfactorily in mining the significant features of the coefficients.  相似文献   

15.
A challenging problem in the analysis of high-dimensional data is variable selection. In this study, we describe a bootstrap based technique for selecting predictors in partial least-squares regression (PLSR) and principle component regression (PCR) in high-dimensional data. Using a bootstrap-based technique for significance tests of the regression coefficients, a subset of the original variables can be selected to be included in the regression, thus obtaining a more parsimonious model with smaller prediction errors. We compare the bootstrap approach with several variable selection approaches (jack-knife and sparse formulation-based methods) on PCR and PLSR in simulation and real data.  相似文献   

16.
We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>pn>p cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods.  相似文献   

17.
ABSTRACT

In this article we evaluate the performance of a randomization test for a subset of regression coefficients in a linear model. This randomization test is based on random permutations of the independent variables. It is shown that the method maintains its level of significance, except for extreme situations, and has power that approximates the power of another randomization test, which is based on the permutation of residuals from the reduced model. We also show, via an example, that the method of permuting independent variables is more valuable than other randomization methods because it can be used in connection with the downweighting of outliers.  相似文献   

18.
张波  刘晓倩 《统计研究》2019,36(4):119-128
本文旨在研究基于fused惩罚的稀疏主成分分析方法,以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先,从回归分析角度出发,提出一种求解稀疏主成分的简便思路,给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法,并证明在惩罚函数取1-范数时,该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次,本文提出将fused惩罚与主成分分析相结合,得到一种fused稀疏主成分分析方法,并从惩罚性矩阵分解和回归分析两个角度,给出两种模型形式。在理论上证明了两种模型的求解结果是一致的,故将其统称为FSPCA模型。模拟实验显示,FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后,将FSPCA模型应用于手写数字识别,发现与SPC模型相比,FSPCA模型所提取的主成分具备更好的解释性,这使得该模型更具实用价值。  相似文献   

19.
The problem of constructing nonlinear regression models is investigated to analyze data with complex structure. We introduce radial basis functions with hyperparameter that adjusts the amount of overlapping basis functions and adopts the information of the input and response variables. By using the radial basis functions, we construct nonlinear regression models with help of the technique of regularization. Crucial issues in the model building process are the choices of a hyperparameter, the number of basis functions and a smoothing parameter. We present information-theoretic criteria for evaluating statistical models under model misspecification both for distributional and structural assumptions. We use real data examples and Monte Carlo simulations to investigate the properties of the proposed nonlinear regression modeling techniques. The simulation results show that our nonlinear modeling performs well in various situations, and clear improvements are obtained for the use of the hyperparameter in the basis functions.  相似文献   

20.
ABSTRACT

In statistical practice, inferences on standardized regression coefficients are often required, but complicated by the fact that they are nonlinear functions of the parameters, and thus standard textbook results are simply wrong. Within the frequentist domain, asymptotic delta methods can be used to construct confidence intervals of the standardized coefficients with proper coverage probabilities. Alternatively, Bayesian methods solve similar and other inferential problems by simulating data from the posterior distribution of the coefficients. In this paper, we present Bayesian procedures that provide comprehensive solutions for inferences on the standardized coefficients. Simple computing algorithms are developed to generate posterior samples with no autocorrelation and based on both noninformative improper and informative proper prior distributions. Simulation studies show that Bayesian credible intervals constructed by our approaches have comparable and even better statistical properties than their frequentist counterparts, particularly in the presence of collinearity. In addition, our approaches solve some meaningful inferential problems that are difficult if not impossible from the frequentist standpoint, including identifying joint rankings of multiple standardized coefficients and making optimal decisions concerning their sizes and comparisons. We illustrate applications of our approaches through examples and make sample R functions available for implementing our proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号