首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lasso has been widely used for variable selection because of its sparsity, and a number of its extensions have been developed. In this article, we propose a robust variant of Lasso for the time-course multivariate response, and develop an algorithm which transforms the optimization into a sequence of ridge regressions. The proposed method enables us to effectively handle multivariate responses and employs a basis representation of the regression parameters to reduce the dimensionality. We assess the proposed method through simulation and apply it to the microarray data.  相似文献   

2.
Abstract

Variable selection is a fundamental challenge in statistical learning if one works with data sets containing huge amount of predictors. In this artical we consider procedures popular in model selection: Lasso and adaptive Lasso. Our goal is to investigate properties of estimators based on minimization of Lasso-type penalized empirical risk with a convex loss function, in particular nondifferentiable. We obtain theorems concerning rate of convergence in estimation, consistency in model selection and oracle properties for Lasso estimators if the number of predictors is fixed, i.e. it does not depend on the sample size. Moreover, we study properties of Lasso and adaptive Lasso estimators on simulated and real data sets.  相似文献   

3.
We consider a semi-parametric approach to perform the joint segmentation of multiple series sharing a common functional part. We propose an iterative procedure based on Dynamic Programming for the segmentation part and Lasso estimators for the functional part. Our Lasso procedure, based on the dictionary approach, allows us to both estimate smooth functions and functions with local irregularity, which permits more flexibility than previous proposed methods. This yields to a better estimation of the functional part and improvements in the segmentation. The performance of our method is assessed using simulated data and real data from agriculture and geodetic studies. Our estimation procedure results to be a reliable tool to detect changes and to obtain an interpretable estimation of the functional part of the model in terms of known functions.  相似文献   

4.
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.  相似文献   

5.
闫懋博  田茂再 《统计研究》2021,38(1):147-160
Lasso等惩罚变量选择方法选入模型的变量数受到样本量限制。文献中已有研究变量系数显著性的方法舍弃了未选入模型的变量含有的信息。本文在变量数大于样本量即p>n的高维情况下,使用随机化bootstrap方法获得变量权重,在计算适应性Lasso时构建选择事件的条件分布并剔除系数不显著的变量,以得到最终估计结果。本文的创新点在于提出的方法突破了适应性Lasso可选变量数的限制,当观测数据含有大量干扰变量时能够有效地识别出真实变量与干扰变量。与现有的惩罚变量选择方法相比,多种情境下的模拟研究展示了所提方法在上述两个问题中的优越性。实证研究中对NCI-60癌症细胞系数据进行了分析,结果较以往文献有明显改善。  相似文献   

6.
The adaptive least absolute shrinkage and selection operator (Lasso) and least absolute deviation (LAD)-Lasso are two attractive shrinkage methods for simultaneous variable selection and regression parameter estimation. While the adaptive Lasso is efficient for small magnitude errors, LAD-Lasso is robust against heavy-tailed errors and severe outliers. In this article, we consider a data-driven convex combination of these two modern procedures to produce a robust adaptive Lasso, which not only enjoys the oracle properties, but synthesizes the advantages of the adaptive Lasso and LAD-Lasso. It fully adapts to different error structures including the infinite variance case and automatically chooses the optimal weight to achieve both robustness and high efficiency. Extensive simulation studies demonstrate a good finite sample performance of the robust adaptive Lasso. Two data sets are analyzed to illustrate the practical use of the procedure.  相似文献   

7.
Quantile regression has gained increasing popularity as it provides richer information than the regular mean regression, and variable selection plays an important role in the quantile regression model building process, as it improves the prediction accuracy by choosing an appropriate subset of regression predictors. Unlike the traditional quantile regression, we consider the quantile as an unknown parameter and estimate it jointly with other regression coefficients. In particular, we adopt the Bayesian adaptive Lasso for the maximum entropy quantile regression. A flat prior is chosen for the quantile parameter due to the lack of information on it. The proposed method not only addresses the problem about which quantile would be the most probable one among all the candidates, but also reflects the inner relationship of the data through the estimated quantile. We develop an efficient Gibbs sampler algorithm and show that the performance of our proposed method is superior than the Bayesian adaptive Lasso and Bayesian Lasso through simulation studies and a real data analysis.  相似文献   

8.
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.  相似文献   

9.
We consider the problem of variable screening in ultra-high-dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE-based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application.  相似文献   

10.
Penalised likelihood methods, such as the least absolute shrinkage and selection operator (Lasso) and the smoothly clipped absolute deviation penalty, have become widely used for variable selection in recent years. These methods impose penalties on regression coefficients to shrink a subset of them towards zero to achieve parameter estimation and model selection simultaneously. The amount of shrinkage is controlled by the regularisation parameter. Popular approaches for choosing the regularisation parameter include cross‐validation, various information criteria and bootstrapping methods that are based on mean square error. In this paper, a new data‐driven method for choosing the regularisation parameter is proposed and the consistency of the method is established. It holds not only for the usual fixed‐dimensional case but also for the divergent setting. Simulation results show that the new method outperforms other popular approaches. An application of the proposed method to motif discovery in gene expression analysis is included in this paper.  相似文献   

11.
Realized volatility computed from high-frequency data is an important measure for many applications in finance, and its dynamics have been widely investigated. Recent notable advances that perform well include the heterogeneous autoregressive (HAR) model which can approximate long memory, is very parsimonious, is easy to estimate, and features good out-of-sample performance. We prove that the least absolute shrinkage and selection operator (Lasso) recovers the lags structure of the HAR model asymptotically if it is the true model, and we present Monte Carlo evidence in finite samples. The HAR model's lags structure is not fully in agreement with the one found using the Lasso on real data. Moreover, we provide empirical evidence that there are two clear breaks in structure for most of the assets we consider. These results bring into question the appropriateness of the HAR model for realized volatility. Finally, in an out-of-sample analysis, we show equal performance of the HAR model and the Lasso approach.  相似文献   

12.
In this article, we study a nonparametric approach regarding a general nonlinear reduced form equation to achieve a better approximation of the optimal instrument. Accordingly, we propose the nonparametric additive instrumental variable estimator (NAIVE) with the adaptive group Lasso. We theoretically demonstrate that the proposed estimator is root-n consistent and asymptotically normal. The adaptive group Lasso helps us select the valid instruments while the dimensionality of potential instrumental variables is allowed to be greater than the sample size. In practice, the degree and knots of B-spline series are selected by minimizing the BIC or EBIC criteria for each nonparametric additive component in the reduced form equation. In Monte Carlo simulations, we show that the NAIVE has the same performance as the linear instrumental variable (IV) estimator for the truly linear reduced form equation. On the other hand, the NAIVE performs much better in terms of bias and mean squared errors compared to other alternative estimators under the high-dimensional nonlinear reduced form equation. We further illustrate our method in an empirical study of international trade and growth. Our findings provide a stronger evidence that international trade has a significant positive effect on economic growth.  相似文献   

13.
We consider the problem of constructing confidence intervals for nonparametric functional data analysis using empirical likelihood. In this doubly infinite-dimensional context, we demonstrate the Wilk's phenomenon and propose a bias-corrected construction that requires neither undersmoothing nor direct bias estimation. We also extend our results to partially linear regression models involving functional data. Our numerical results demonstrate improved performance of the empirical likelihood methods over normal approximation-based methods.  相似文献   

14.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

15.
We propose a new estimator, the thresholded scaled Lasso, in high-dimensional threshold regressions. First, we establish an upper bound on the ? estimation error of the scaled Lasso estimator of Lee, Seo, and Shin. This is a nontrivial task as the literature on high-dimensional models has focused almost exclusively on ?1 and ?2 estimation errors. We show that this sup-norm bound can be used to distinguish between zero and nonzero coefficients at a much finer scale than would have been possible using classical oracle inequalities. Thus, our sup-norm bound is tailored to consistent variable selection via thresholding. Our simulations show that thresholding the scaled Lasso yields substantial improvements in terms of variable selection. Finally, we use our estimator to shed further empirical light on the long-running debate on the relationship between the level of debt (public and private) and GDP growth. Supplementary materials for this article are available online.  相似文献   

16.
With the quantile regression methods successfully applied in various applications, we often need to tackle with the big dataset with thousands of variables and millions of observations. In this article, we focus on the variable selection aspect of penalized quantile regression, and propose a new method Sampling Lasso Quantile Regression (SLQR), which allows selecting a small amount but informative data for fitting quantile regression models. Different from the ordinary regularization methods, this SLQR method performs a sampling technique to reduce the number of observations before applying Lasso. Through numerical simulation studies and real application in Greenhouse Gas Observing Network, we illustrate the efficacy of the SLQR method. The numerical results show that the SLQR method is able to achieve a high-precision quantile regression on large-scale data for both prediction and interpretation.  相似文献   

17.
We consider statistical inference for longitudinal partially linear models when the response variable is sometimes missing with missingness probability depending on the covariate that is measured with error. The block empirical likelihood procedure is used to estimate the regression coefficients and residual adjusted block empirical likelihood is employed for the baseline function. This leads us to prove a nonparametric version of Wilk's theorem. Compared with methods based on normal approximations, our proposed method does not require a consistent estimators for the asymptotic variance and bias. An application to a longitudinal study is used to illustrate the procedure developed here. A simulation study is also reported.  相似文献   

18.
《Econometric Reviews》2012,31(1):27-53
Abstract

Transformed diffusions (TDs) have become increasingly popular in financial modeling for their model flexibility and tractability. While existing TD models are predominately one-factor models, empirical evidence often prefers models with multiple factors. We propose a novel distribution-driven nonlinear multifactor TD model with latent components. Our model is a transformation of a underlying multivariate Ornstein–Uhlenbeck (MVOU) process, where the transformation function is endogenously specified by a flexible parametric stationary distribution of the observed variable. Computationally efficient exact likelihood inference can be implemented for our model using a modified Kalman filter algorithm and the transformed affine structure also allows us to price derivatives in semi-closed form. We compare the proposed multifactor model with existing TD models for modeling VIX and pricing VIX futures. Our results show that the proposed model outperforms all existing TD models both in the sample and out of the sample consistently across all categories and scenarios of our comparison.  相似文献   

19.
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach.  相似文献   

20.
In this paper, we propose robust randomized quantile regression estimators for the mean and (condition) variance functions of the popular heteroskedastic non parametric regression model. Unlike classical approaches which consider quantile as a fixed quantity, our method treats quantile as a uniformly distributed random variable. Our proposed method can be employed to estimate the error distribution, which could significantly improve prediction results. An automatic bandwidth selection scheme will be discussed. Asymptotic properties and relative efficiencies of the proposed estimators are investigated. Our empirical results show that the proposed estimators work well even for random errors with infinite variances. Various numerical simulations and two real data examples are used to demonstrate our methodologies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号