首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Lasso achieves variance reduction and variable selection by solving an ?1‐regularized least squares problem. Huang (2003) claims that ‘there always exists an interval of regularization parameter values such that the corresponding mean squared prediction error for the Lasso estimator is smaller than for the ordinary least square estimator’. This result is correct. However, its proof in Huang (2003) is not. This paper presents a corrected proof of the claim, which exposes and uses some interesting fundamental properties of the Lasso.  相似文献   

2.
Lasso has been widely used for variable selection because of its sparsity, and a number of its extensions have been developed. In this article, we propose a robust variant of Lasso for the time-course multivariate response, and develop an algorithm which transforms the optimization into a sequence of ridge regressions. The proposed method enables us to effectively handle multivariate responses and employs a basis representation of the regression parameters to reduce the dimensionality. We assess the proposed method through simulation and apply it to the microarray data.  相似文献   

3.
Regularization and variable selection via the elastic net   总被引:2,自引:0,他引:2  
Summary.  We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors ( p ) is much bigger than the number of observations ( n ). By contrast, the lasso is not a very satisfactory variable selection method in the p ≫ n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.  相似文献   

4.
Penalization has been extensively adopted for variable selection in regression. In some applications, covariates have natural grouping structures, where those in the same group have correlated measurements or related functions. Under such settings, variable selection should be conducted at both the group-level and within-group-level, that is, a bi-level selection. In this study, we propose the adaptive sparse group Lasso (adSGL) method, which combines the adaptive Lasso and adaptive group Lasso (GL) to achieve bi-level selection. It can be viewed as an improved version of sparse group Lasso (SGL) and uses data-dependent weights to improve selection performance. For computation, a block coordinate descent algorithm is adopted. Simulation shows that adSGL has satisfactory performance in identifying both individual variables and groups and lower false discovery rate and mean square error than SGL and GL. We apply the proposed method to the analysis of a household healthcare expenditure data set.  相似文献   

5.
Realized volatility computed from high-frequency data is an important measure for many applications in finance, and its dynamics have been widely investigated. Recent notable advances that perform well include the heterogeneous autoregressive (HAR) model which can approximate long memory, is very parsimonious, is easy to estimate, and features good out-of-sample performance. We prove that the least absolute shrinkage and selection operator (Lasso) recovers the lags structure of the HAR model asymptotically if it is the true model, and we present Monte Carlo evidence in finite samples. The HAR model's lags structure is not fully in agreement with the one found using the Lasso on real data. Moreover, we provide empirical evidence that there are two clear breaks in structure for most of the assets we consider. These results bring into question the appropriateness of the HAR model for realized volatility. Finally, in an out-of-sample analysis, we show equal performance of the HAR model and the Lasso approach.  相似文献   

6.
A robust rank-based estimator for variable selection in linear models, with grouped predictors, is studied. The proposed estimation procedure extends the existing rank-based variable selection [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252] and the ww-scad [Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] to linear regression models with grouped variables. The resulting estimator is robust to contamination or deviations in both the response and the design space.The Oracle property and asymptotic normality of the estimator are established under some regularity conditions. Simulation studies reveal that the proposed method performs better than the existing rank-based methods [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252; Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] for grouped variables models. This estimation procedure also outperforms the adaptive hlasso [Zhou, N., and Zhu, J. (2010), ‘Group Variable Selection Via a Hierarchical Lasso and its Oracle Property’, Interface, 3(4):557–574] in the presence of local contamination in the design space or for heavy-tailed error distribution.  相似文献   

7.
We consider the problem of variables selection and estimation in linear regression model in situations where the number of parameters diverges with the sample size. We propose the adaptive Generalized Ridge-Lasso (\mboxAdaGril) which is an extension of the the adaptive Elastic Net. AdaGril incorporates information redundancy among correlated variables for model selection and estimation. It combines the strengths of the quadratic regularization and the adaptively weighted Lasso shrinkage. In this article, we highlight the grouped selection property for AdaCnet method (one type of AdaGril) in the equal correlation case. Under weak conditions, we establish the oracle property of AdaGril which ensures the optimal large performance when the dimension is high. Consequently, it achieves both goals of handling the problem of collinearity in high dimension and enjoys the oracle property. Moreover, we show that AdaGril estimator achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the “true” regression coefficient. This bound is obtained under a similar weak Restricted Eigenvalue (RE) condition used for Lasso. Simulations studies show that some particular cases of AdaGril outperform its competitors.  相似文献   

8.
We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of 2 norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time.  相似文献   

9.
Abstract. Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high‐dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.  相似文献   

10.
With the quantile regression methods successfully applied in various applications, we often need to tackle with the big dataset with thousands of variables and millions of observations. In this article, we focus on the variable selection aspect of penalized quantile regression, and propose a new method Sampling Lasso Quantile Regression (SLQR), which allows selecting a small amount but informative data for fitting quantile regression models. Different from the ordinary regularization methods, this SLQR method performs a sampling technique to reduce the number of observations before applying Lasso. Through numerical simulation studies and real application in Greenhouse Gas Observing Network, we illustrate the efficacy of the SLQR method. The numerical results show that the SLQR method is able to achieve a high-precision quantile regression on large-scale data for both prediction and interpretation.  相似文献   

11.
We compare the performance of recently developed regularized covariance matrix estimators for Markowitz's portfolio optimization and of the minimum variance portfolio (MVP) problem in particular. We focus on seven estimators that are applied to the MVP problem in the literature; three regularize the eigenvalues of the sample covariance matrix, and the other four assume the sparsity of the true covariance matrix or its inverse. Comparisons are made with two sets of long-term S&P 500 stock return data that represent two extreme scenarios of active and passive management. The results show that the MVPs with sparse covariance estimators have high Sharpe ratios but that the naive diversification (also known as the ‘uniform (on market share) portfolio’) still performs well in terms of wealth growth.  相似文献   

12.
Quantile regression has gained increasing popularity as it provides richer information than the regular mean regression, and variable selection plays an important role in the quantile regression model building process, as it improves the prediction accuracy by choosing an appropriate subset of regression predictors. Unlike the traditional quantile regression, we consider the quantile as an unknown parameter and estimate it jointly with other regression coefficients. In particular, we adopt the Bayesian adaptive Lasso for the maximum entropy quantile regression. A flat prior is chosen for the quantile parameter due to the lack of information on it. The proposed method not only addresses the problem about which quantile would be the most probable one among all the candidates, but also reflects the inner relationship of the data through the estimated quantile. We develop an efficient Gibbs sampler algorithm and show that the performance of our proposed method is superior than the Bayesian adaptive Lasso and Bayesian Lasso through simulation studies and a real data analysis.  相似文献   

13.
王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

14.
In cancer diagnosis studies, high‐throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the ‘large d, small n’ feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single data sets can be unsatisfactory. A cost‐effective remedy is to conduct integrative analysis of multiple heterogeneous data sets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer data sets, show satisfactory performance of the proposed methods.  相似文献   

15.
史兴杰等 《统计研究》2020,37(9):95-105
对于实证研究中经常遇到变量维数高和存在异常值的二分类问题,探索稳健的高维二分类方法显得尤为重要。本文提出基于Lasso惩罚的光滑0-1损失函数二分类法,并利用Fabs 算法高效地解决了变量选择和参数估计问题。数值模拟的结果表明,在不同异常值比例下该方法均具有良好的稳健性。基于CHIP 2013年度数据,利用该方法对农民工子女高中入学决定的影响因素进行了实证研究。分析发现,农民工父母的教育水平、教育水平与家庭经济状况的交互作用、农民工子女性别、性别与民族的交互作用均对农民工子女的入学决定有重要影响。  相似文献   

16.
This paper studies the sparsity selection and estimation in nonparametric additive models. The sparsity refers to two types — across and within variables. Sparsity across variables corresponds to the irrelevant components in the models; sparsity within variables corresponds to zero function values over the sub-domains of the relevant components. To select and estimate the sparsity, I approximate each component by B-splines and propose a group bridge penalized method, which can simultaneously identify the zero functions and zero structures of the nonzero functions. Simulation studies demonstrate the effectiveness of the proposed method in sparsity selection and estimation across and within variables.  相似文献   

17.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

18.
Multi-index models have attracted much attention recently as an approach to circumvent the curse of dimensionality when modeling high-dimensional data. This paper proposes a novel regularization method, called MAVE-glasso, for simultaneous parameter estimation and variable selection in multi-index models. The advantages of the proposed method include transformation invariance, automatic variable selection, automatic removal of noninformative observations, and row-wise shrinkage. An efficient row-wise coordinate descent algorithm is proposed to calculate the estimates. Simulation and real examples are used to demonstrate the excellent performance of MAVE-glasso.  相似文献   

19.
Abstract

Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized linear models (GLMs). Although the overlapping group lasso method for GLMs has been widely applied in some applications, the theoretical properties about it are still unknown. Under some general conditions, we presents the oracle inequalities for the estimation and prediction error of overlapping group Lasso method in the generalized linear model setting. Then, we apply these results to the so-called Logistic and Poisson regression models. It is shown that the results of the Lasso and group Lasso procedures for GLMs can be recovered by specifying the group structures in our proposed method. The effect of overlap and the performance of variable selection of our proposed method are both studied by numerical simulations. Finally, we apply our proposed method to two gene expression data sets: the p53 data and the lung cancer data.  相似文献   

20.
Comparison of different estimation techniques for portfolio selection   总被引:1,自引:0,他引:1  
The main problem in applying the mean-variance portfolio selection consists of the fact that the first two moments of the asset returns are unknown. In practice the optimal portfolio weights have to be estimated. This is usually done by replacing the moments by the classical unbiased sample estimators. We provide a comparison of the exact and the asymptotic distributions of the estimated portfolio weights as well as a sensitivity analysis to shifts in the moments of the asset returns. Furthermore we consider several types of shrinkage estimators for the moments. The corresponding estimators of the portfolio weights are compared with each other and with the portfolio weights based on the sample estimators of the moments. We show how the uncertainty about the portfolio weights can be introduced into the performance measurement of trading strategies. The methodology explains the bad out-of-sample performance of the classical Markowitz procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号