首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary.  We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing the log-likelihood of the data, under a multivariate normal model, subject to a penalty; it is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso and the elastic net are special cases of covariance-regularized regression, and we demonstrate that certain previously unexplored forms of covariance-regularized regression can outperform existing methods in a range of situations. The covariance-regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyse gene expression data sets with multiple class and survival outcomes.  相似文献   

2.
In a calibration of near-infrared (NIR) instrument, we regress some chemical compositions of interest as a function of their NIR spectra. In this process, we have two immediate challenges: first, the number of variables exceeds the number of observations and, second, the multicollinearity between variables are extremely high. To deal with the challenges, prediction models that produce sparse solutions have recently been proposed. The term ‘sparse’ means that some model parameters are zero estimated and the other parameters are estimated naturally away from zero. In effect, a variable selection is embedded in the model to potentially achieve a better prediction. Many studies have investigated sparse solutions for latent variable models, such as partial least squares and principal component regression, and for direct regression models such as ridge regression (RR). However, in the latter, it mainly involves an L1 norm penalty to the objective function such as lasso regression. In this study, we investigate new sparse alternative models for RR within a random effects model framework, where we consider Cauchy and mixture-of-normals distributions on the random effects. The results indicate that the mixture-of-normals model produces a sparse solution with good prediction and better interpretation. We illustrate the methods using NIR spectra datasets from milk and corn specimens.  相似文献   

3.
Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable selection methods in penalized Cox models cannot deal properly with categorical variables such as gender and family history. The group lasso penalty can combine clinical and genomic covariates effectively. In this article, we introduce an optimization algorithm for Cox regression with group lasso penalty. We compare our method with other methods on simulated and real microarray data sets.  相似文献   

4.
In high-dimensional regression problems regularization methods have been a popular choice to address variable selection and multicollinearity. In this paper we study bridge regression that adaptively selects the penalty order from data and produces flexible solutions in various settings. We implement bridge regression based on the local linear and quadratic approximations to circumvent the nonconvex optimization problem. Our numerical study shows that the proposed bridge estimators are a robust choice in various circumstances compared to other penalized regression methods such as the ridge, lasso, and elastic net. In addition, we propose group bridge estimators that select grouped variables and study their asymptotic properties when the number of covariates increases along with the sample size. These estimators are also applied to varying-coefficient models. Numerical examples show superior performances of the proposed group bridge estimators in comparisons with other existing methods.  相似文献   

5.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

6.
Abstract

This article presents a class of novel penalties that are defined under a unified framework, which includes lasso, SCAD and ridge as special cases, and novel functions, such as the asymmetric quantile check function. The proposed class of penalties is capable of producing alternative differentiable penalties to lasso. We mainly focus on this case and show its desirable properties, propose an efficient algorithm for the parameter estimation and prove the theoretical properties of the resulting estimators. Moreover, we exploit the differentiability of the penalty function by deriving a novel Generalized Information Criterion (GIC) for model selection. The method is implemented in the R package DLASSO freely available from CRAN, http://CRAN.R-project.org/package=DLASSO.  相似文献   

7.
8.
We propose marginalized lasso, a new nonconvex penalization for variable selection in regression problem. The marginalized lasso penalty is motivated from integrating out the penalty parameter in the original lasso penalty with a gamma prior distribution. This study provides a thresholding rule and a lasso-based iterative algorithm for parameter estimation in the marginalized lasso. We also provide a coordinate descent algorithm to efficiently optimize the marginalized lasso penalized regression. Numerical comparison studies are provided to demonstrate its competitiveness over the existing sparsity-inducing penalizations and suggest some guideline for tuning parameter selection.  相似文献   

9.
We consider the problem of constructing nonlinear regression models with Gaussian basis functions, using lasso regularization. Regularization with a lasso penalty is an advantageous in that it estimates some coefficients in linear regression models to be exactly zero. We propose imposing a weighted lasso penalty on a nonlinear regression model and thereby selecting the number of basis functions effectively. In order to select tuning parameters in the regularization method, we use a deviance information criterion proposed by Spiegelhalter et al. (2002), calculating the effective number of parameters by Gibbs sampling. Simulation results demonstrate that our methodology performs well in various situations.  相似文献   

10.
The problem of modeling the relationship between a set of covariates and a multivariate response with correlated components often arises in many areas of research such as genetics, psychometrics, signal processing. In the linear regression framework, such task can be addressed using a number of existing methods. In the high-dimensional sparse setting, most of these methods rely on the idea of penalization in order to efficiently estimate the regression matrix. Examples of such methods include the lasso, the group lasso, the adaptive group lasso or the simultaneous variable selection (SVS) method. Crucially, a suitably chosen penalty also allows for an efficient exploitation of the correlation structure within the multivariate response. In this paper we introduce a novel variant of such method called the adaptive SVS, which is closely linked with the adaptive group lasso. Via a simulation study we investigate its performance in the high-dimensional sparse regression setting. We provide a comparison with a number of other popular methods under different scenarios and show that the adaptive SVS is a powerful tool for efficient recovery of signal in such setting. The methods are applied to genetic data.  相似文献   

11.
In high-dimensional data settings, sparse model fits are desired, which can be obtained through shrinkage or boosting techniques. We investigate classical shrinkage techniques such as the lasso, which is theoretically known to be biased, new techniques that address this problem, such as elastic net and SCAD, and boosting technique CoxBoost and extensions of it, which allow to incorporate additional structure. To examine, whether these methods, that are designed for or frequently used in high-dimensional survival data analysis, provide sensible results in low-dimensional data settings as well, we consider the well known GBSG breast cancer data. In detail, we study the bias, stability and sparseness of these model fitting techniques via comparison to the maximum likelihood estimate and resampling, and their prediction performance via prediction error curve estimates.  相似文献   

12.
To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore, it can be used to compute the asymptotic variance and the mean-squared error (MSE). In this paper we compute the influence function, the asymptotic variance and the MSE for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations non-standard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.  相似文献   

13.
This paper adopts a Bayesian strategy for generalized ridge estimation for high-dimensional regression. We also consider significance testing based on the proposed estimator, which is useful for selecting regressors. Both theoretical and simulation studies show that the proposed estimator can simultaneously outperform the ordinary ridge estimator and the LSE in terms of the mean square error (MSE) criterion. The simulation study also demonstrates the competitive MSE performance of our proposal with the Lasso under sparse models. We demonstrate the method using the lung cancer data involving high-dimensional microarrays.  相似文献   

14.
The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose a Bayesian generalized fused lasso modeling based on a normal-exponential-gamma (NEG) prior distribution. The NEG prior is assumed into the difference of successive regression coefficients. The proposed method enables us to construct a more versatile sparse model than the ordinary fused lasso using a flexible regularization term. Simulation studies and real data analyses show that the proposed method has superior performance to the ordinary fused lasso.  相似文献   

15.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

16.
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size.  相似文献   

17.
Abstract

There has been much attention on the high-dimensional linear regression models, which means the number of observations is much less than that of covariates. Considering the fact that the high dimensionality often induces the collinearity problem, in this article, we study the penalized quantile regression with the elastic net (EnetQR) that combines the strengths of the quadratic regularization and the lasso shrinkage. We investigate the weak oracle property of the EnetQR under mild conditions in the high dimensional setting. Moreover, we propose a two-step procedure, called adaptive elastic net quantile regression (AEnetQR), in which the weight vector in the second step is constructed from the EnetQR estimate in the first step. This two-step procedure is justified theoretically to possess the weak oracle property. The finite sample properties are performed through the Monte Carlo simulation and a real-data analysis.  相似文献   

18.
Summary.  The lasso penalizes a least squares regression by the sum of the absolute values ( L 1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the 'fused lasso', a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L 1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N , the sample size. The technique is also extended to the 'hinge' loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.  相似文献   

19.
Regularization and variable selection via the elastic net   总被引:2,自引:0,他引:2  
Summary.  We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors ( p ) is much bigger than the number of observations ( n ). By contrast, the lasso is not a very satisfactory variable selection method in the p ≫ n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.  相似文献   

20.
This article develops the adaptive elastic net generalized method of moments (GMM) estimator in large-dimensional models with potentially (locally) invalid moment conditions, where both the number of structural parameters and the number of moment conditions may increase with the sample size. The basic idea is to conduct the standard GMM estimation combined with two penalty terms: the adaptively weighted lasso shrinkage and the quadratic regularization. It is a one-step procedure of valid moment condition selection, nonzero structural parameter selection (i.e., model selection), and consistent estimation of the nonzero parameters. The procedure achieves the standard GMM efficiency bound as if we know the valid moment conditions ex ante, for which the quadratic regularization is important. We also study the tuning parameter choice, with which we show that selection consistency still holds without assuming Gaussianity. We apply the new estimation procedure to dynamic panel data models, where both the time and cross-section dimensions are large. The new estimator is robust to possible serial correlations in the regression error terms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号