首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
A new regularization method for regression models is proposed. The criterion to be minimized contains a penalty term which explicitly links strength of penalization to the correlation between predictors. Like the elastic net, the method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. A boosted version of the penalized estimator, which is based on a new boosting method, allows to select variables. Real world data and simulations show that the method compares well to competing regularization techniques. In settings where the number of predictors is smaller than the number of observations it frequently performs better than competitors, in high dimensional settings prediction measures favor the elastic net while accuracy of estimation and stability of variable selection favors the newly proposed method.  相似文献   

2.
In panel data analysis, predictors may impact response in substantially different manner. Some predictors are in homogenous effects across all individuals, while the others are in heterogenous way. How to effectively differentiate these two kinds of predictors is crucial, particularly in high-dimensional panel data, since the number of parameters should be greatly reduced and hence lead to better interpretability by homogenous assumption. In this article, based on a hierarchical Bayesian panel regression model, we propose a novel yet effective Markov chain Monte Carlo (MCMC) algorithm together with a simple maximum ratio criterion to detect the predictors in homogenous effects in high-dimensional panel data. Extensive Monte Carlo simulations show that this MCMC algorithm performs well. The usefulness of the proposed method is further demonstrated by a real example from China financial market.  相似文献   

3.
Regularization and variable selection via the elastic net   总被引:2,自引:0,他引:2  
Summary.  We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors ( p ) is much bigger than the number of observations ( n ). By contrast, the lasso is not a very satisfactory variable selection method in the p ≫ n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.  相似文献   

4.
Abstract. Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high‐dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.  相似文献   

5.
We consider the problem of selecting variables in factor analysis models. The $L_1$ regularization procedure is introduced to perform an automatic variable selection. In the factor analysis model, each variable is controlled by multiple factors when there are more than one underlying factor. We treat parameters corresponding to the multiple factors as grouped parameters, and then apply the group lasso. Furthermore, the weight of the group lasso penalty is modified to obtain appropriate estimates and improve the performance of variable selection. Crucial issues in this modeling procedure include the selection of the number of factors and a regularization parameter. Choosing these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating the factor analysis model via the weighted group lasso. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. The Canadian Journal of Statistics 40: 345–361; 2012 © 2012 Statistical Society of Canada  相似文献   

6.
Varying-coefficient models (VCMs) are useful tools for analysing longitudinal data. They can effectively describe the relationship between predictors and responses repeatedly measured. VCMs estimated by regularization methods are strongly affected by values of regularization parameters, and therefore selecting these values is a crucial issue. In order to choose these parameters objectively, we derive model selection criteria for evaluating VCMs from the viewpoints of information-theoretic and Bayesian approach. Models are estimated by the method of regularization with basis expansions, and then they are evaluated by model selection criteria. We demonstrate the effectiveness of the proposed criteria through Monte Carlo simulations and real data analysis.  相似文献   

7.
In a nonlinear regression model based on a regularization method, selection of appropriate regularization parameters is crucial. Information criteria such as generalized information criterion (GIC) and generalized Bayesian information criterion (GBIC) are useful for selecting the optimal regularization parameters. However, the optimal parameter is often determined by calculating information criterion for all candidate regularization parameters, and so the computational cost is high. One simple method by which to accomplish this is to regard GIC or GBIC as a function of the regularization parameters and to find a value minimizing GIC or GBIC. However, it is unclear how to solve the optimization problem. In the present article, we propose an efficient Newton–Raphson type iterative method for selecting optimal regularization parameters with respect to GIC or GBIC in a nonlinear regression model based on basis expansions. This method reduces the computational time remarkably compared to the grid search and can select more suitable regularization parameters. The effectiveness of the method is illustrated through real data examples.  相似文献   

8.
This article has the following contributions. First, this article develops a new criterion for identifying whether or not a particular time series variable is a common factor in the conventional approximate factor model. Second, by modeling observed factors as a set of potential factors to be identified, this article reveals how to easily pin down the factor without performing a large number of estimations. This allows the researcher to check whether or not each individual in the panel is the underlying common factor and, from there, identify which individuals best represent the factor space by using a new clustering mechanism. Asymptotically, the developed procedure correctly identifies the factor when N and T jointly approach infinity. The procedure is shown to be quite effective in the finite sample by means of Monte Carlo simulation. The procedure is then applied to an empirical example, demonstrating that the newly developed method identifies the unknown common factors accurately.  相似文献   

9.
Model selection is the most persuasive problem in generalized linear models. A model selection criterion based on deviance called the deviance-based criterion (DBC) is proposed. The DBC is obtained by penalizing the difference between the deviance of the fitted model and the full model. Under certain weak conditions, DBC is shown to be a consistent model selection criterion in the sense that with probability approaching to one, the selected model asymptotically equals the optimal model relating response and predictors. Further, the use of DBC in link function selection is also discussed. We compare the proposed model selection criterion with existing methods. The small sample efficiency of proposed model selection criterion is evaluated by the simulation study.  相似文献   

10.
We consider bridge regression models, which can produce a sparse or non-sparse model by controlling a tuning parameter in the penalty term. A crucial part of a model building strategy is the selection of the values for adjusted parameters, such as regularization and tuning parameters. Indeed, this can be viewed as a problem in selecting and evaluating the model. We propose a Bayesian selection criterion for evaluating bridge regression models. This criterion enables us to objectively select the values of the adjusted parameters. We investigate the effectiveness of our proposed modeling strategy with some numerical examples.  相似文献   

11.
ABSTRACT

Functional linear model is of great practical importance, as exemplified by applications in high-throughput studies such as meteorological and biomedical research. In this paper, we propose a new functional variable selection procedure, called functional variable selection via Gram–Schmidt (FGS) orthogonalization, for a functional linear model with a scalar response and multiple functional predictors. Instead of the regularization methods, FGS takes into account the similarity between the functional predictors in a data-driven way and utilizes the technique of Gram–Schmidt orthogonalization to remove the irrelevant predictors. FGS can successfully discriminate between the relevant and the irrelevant functional predictors to achieve a high true positive ratio without including many irrelevant predictors, and yield explainable models, which offers a new perspective for the variable selection method in the functional linear model. Simulation studies are carried out to evaluate the finite sample performance of the proposed method, and also a weather data set is analysed.  相似文献   

12.
The problem of constructing classification methods based on both labeled and unlabeled data sets is considered for analyzing data with complex structures. We introduce a semi-supervised logistic discriminant model with Gaussian basis expansions. Unknown parameters included in the logistic model are estimated by regularization method along with the technique of EM algorithm. For selection of adjusted parameters, we derive a model selection criterion from Bayesian viewpoints. Numerical studies are conducted to investigate the effectiveness of our proposed modeling procedures.  相似文献   

13.
Polynomial autoregressions are usually considered to be unrealistic models for time series. However, this paper shows that they can successfully be used when the purpose of the time series study is to provide forecasts. A projection scheme inspired from projection pursuit regression and feedforward artificial neural networks is used in order to avoid an explosion of the number of parameters when considering a large number of lags. The estimation of the parameters of the projected polynomial autoregressions is a non-linear least-squares problem. A consistency result is proved. A simulation study shows that the naive use of the common final prediction error criterion is inappropriate to identify the best projected polynomial autoregression. An explanation of this phenomenon is given and a correction to the criterion is proposed. An important feature of the polynomial predictors introduced in this paper is their simple implementation, which allows for automatic use. This is illustrated with real data for the three-month US Treasury Bill.  相似文献   

14.
Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often computationally expensive because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require the maximum likelihood estimate and its maximization appears to be simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumed. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection. The proposed approach is implemented in the R package VarSelLCM available on CRAN.  相似文献   

15.
This paper studies the Bridge estimator for a high-dimensional panel data model with heterogeneous varying coefficients, where the random errors are assumed to be serially correlated and cross-sectionally dependent. We establish oracle efficiency and the asymptotic distribution of the Bridge estimator, when the number of covariates increases to infinity with the sample size in both dimensions. A BIC-type criterion is also provided for tuning parameter selection. We further generalise the marginal Bridge estimator for our model to asymptotically correctly identify the covariates with zero coefficients even when the number of covariates is greater than the sample size under a partial orthogonality condition. The finite sample performance of the proposed estimator is demonstrated by simulated data examples, and an empirical application with the US stock dataset is also provided.  相似文献   

16.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

17.
We study the problem of selecting a regularization parameter in penalized Gaussian graphical models. When the goal is to obtain a model with good predictive power, cross-validation is the gold standard. We present a new estimator of Kullback–Leibler loss in Gaussian Graphical models which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross-validation. Our approach is demonstrated on simulated data sets for various types of graphs. The proposed formula exhibits superior performance, especially in the typical small sample size scenario, compared to other available alternatives to cross-validation, such as Akaike's information criterion and Generalized approximate cross-validation. We also show that the estimator can be used to improve the performance of the Bayesian information criterion when the sample size is small.  相似文献   

18.
This article develops the adaptive elastic net generalized method of moments (GMM) estimator in large-dimensional models with potentially (locally) invalid moment conditions, where both the number of structural parameters and the number of moment conditions may increase with the sample size. The basic idea is to conduct the standard GMM estimation combined with two penalty terms: the adaptively weighted lasso shrinkage and the quadratic regularization. It is a one-step procedure of valid moment condition selection, nonzero structural parameter selection (i.e., model selection), and consistent estimation of the nonzero parameters. The procedure achieves the standard GMM efficiency bound as if we know the valid moment conditions ex ante, for which the quadratic regularization is important. We also study the tuning parameter choice, with which we show that selection consistency still holds without assuming Gaussianity. We apply the new estimation procedure to dynamic panel data models, where both the time and cross-section dimensions are large. The new estimator is robust to possible serial correlations in the regression error terms.  相似文献   

19.
We consider the problem of constructing nonlinear regression models with Gaussian basis functions, using lasso regularization. Regularization with a lasso penalty is an advantageous in that it estimates some coefficients in linear regression models to be exactly zero. We propose imposing a weighted lasso penalty on a nonlinear regression model and thereby selecting the number of basis functions effectively. In order to select tuning parameters in the regularization method, we use a deviance information criterion proposed by Spiegelhalter et al. (2002), calculating the effective number of parameters by Gibbs sampling. Simulation results demonstrate that our methodology performs well in various situations.  相似文献   

20.
We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of 2 norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号