首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A Bayesian elastic net approach is presented for variable selection and coefficient estimation in linear regression models. A simple Gibbs sampling algorithm was developed for posterior inference using a location-scale mixture representation of the Bayesian elastic net prior for the regression coefficients. The penalty parameters are chosen through an empirical method that maximizes the data marginal likelihood. Both simulated and real data examples show that the proposed method performs well in comparison to the other approaches.  相似文献   

2.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

3.
We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also present some theoretical arguments that canonical correlation based clustering leads to a better-posed compatibility constant for the design matrix which ensures identifiability and an oracle inequality for the group Lasso. Furthermore, we discuss circumstances where cluster-representatives and using the Lasso as subsequent estimator leads to improved results for prediction and detection of variables. We complement the theoretical analysis with various empirical results.  相似文献   

4.
Oracle Inequalities for Convex Loss Functions with Nonlinear Targets   总被引:1,自引:1,他引:0  
This article considers penalized empirical loss minimization of convex loss functions with unknown target functions. Using the elastic net penalty, of which the Least Absolute Shrinkage and Selection Operator (Lasso) is a special case, we establish a finite sample oracle inequality which bounds the loss of our estimator from above with high probability. If the unknown target is linear, this inequality also provides an upper bound of the estimation error of the estimated parameter vector. Next, we use the non-asymptotic results to show that the excess loss of our estimator is asymptotically of the same order as that of the oracle. If the target is linear, we give sufficient conditions for consistency of the estimated parameter vector. We briefly discuss how a thresholded version of our estimator can be used to perform consistent variable selection. We give two examples of loss functions covered by our framework.  相似文献   

5.
Based on the inverse probability weight method, we, in this article, construct the empirical likelihood (EL) and penalized empirical likelihood (PEL) ratios of the parameter in the linear quantile regression model when the covariates are missing at random, in the presence and absence of auxiliary information, respectively. It is proved that the EL ratio admits a limiting Chi-square distribution. At the same time, the asymptotic normality of the maximum EL and PEL estimators of the parameter is established. Also, the variable selection of the model in the presence and absence of auxiliary information, respectively, is discussed. Simulation study and a real data analysis are done to evaluate the performance of the proposed methods.  相似文献   

6.
Regularization and variable selection via the elastic net   总被引:2,自引:0,他引:2  
Summary.  We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors ( p ) is much bigger than the number of observations ( n ). By contrast, the lasso is not a very satisfactory variable selection method in the p ≫ n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.  相似文献   

7.
We propose the penalized empirical likelihood method via bridge estimator in Cox's proportional hazard model for parameter estimation and variable selection. Under reasonable conditions, we show that penalized empirical likelihood in Cox's proportional hazard model has oracle property. A penalized empirical likelihood ratio for the vector of regression coefficients is defined and its limiting distribution is a chi-square distributions. The advantage of penalized empirical likelihood as a nonparametric likelihood approach is illustrated in testing hypothesis and constructing confidence sets. The method is illustrated by extensive simulation studies and a real example.  相似文献   

8.
In this paper, we extend the modified lasso of Wang et al. (2007) to the linear regression model with autoregressive moving average (ARMA) errors. Such an extension is far from trivial because new devices need to be called for to establish the asymptotics due to the existence of the moving average component. A shrinkage procedure is proposed to simultaneously estimate the parameters and select the informative variables in the regression, autoregressive, and moving average components. We show that the resulting estimator is consistent in both parameter estimation and variable selection, and enjoys the oracle properties. To overcome the complexity in numerical computation caused by the existence of the moving average component, we propose a procedure based on a least squares approximation to implement estimation. The ordinary least squares formulation with the use of the modified lasso makes the computation very efficient. Simulation studies are conducted to evaluate the finite sample performance of the procedure. An empirical example of ground-level ozone is also provided.  相似文献   

9.
Although the t-type estimator is a kind of M-estimator with scale optimization, it has some advantages over the M-estimator. In this article, we first propose a t-type joint generalized linear model as a robust extension to the classical joint generalized linear models for modeling data containing extreme or outlying observations. Next, we develop a t-type pseudo-likelihood (TPL) approach, which can be viewed as a robust version to the existing pseudo-likelihood (PL) approach. To determine which variables significantly affect the variance of the response variable, we then propose a unified penalized maximum TPL method to simultaneously select significant variables for the mean and dispersion models in t-type joint generalized linear models. Thus, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the mean and dispersion models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies are conducted to illustrate the proposed methods.  相似文献   

10.
Penalized regression methods have recently gained enormous attention in statistics and the field of machine learning due to their ability of reducing the prediction error and identifying important variables at the same time. Numerous studies have been conducted for penalized regression, but most of them are limited to the case when the data are independently observed. In this paper, we study a variable selection problem in penalized regression models with autoregressive (AR) error terms. We consider three estimators, adaptive least absolute shrinkage and selection operator, bridge, and smoothly clipped absolute deviation, and propose a computational algorithm that enables us to select a relevant set of variables and also the order of AR error terms simultaneously. In addition, we provide their asymptotic properties such as consistency, selection consistency, and asymptotic normality. The performances of the three estimators are compared with one another using simulated and real examples.  相似文献   

11.
In high-dimensional setting, componentwise L2boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, SparseL2Boosting and Twin Boosting, have been proposed to improve the variable selection of L2boosting algorithm. In this article, we propose a new general sparse boosting method (GSBoosting). The relations are established between GSBoosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSBoosting has good performance in both prediction and variable selection.  相似文献   

12.
In this article, we study a nonparametric approach regarding a general nonlinear reduced form equation to achieve a better approximation of the optimal instrument. Accordingly, we propose the nonparametric additive instrumental variable estimator (NAIVE) with the adaptive group Lasso. We theoretically demonstrate that the proposed estimator is root-n consistent and asymptotically normal. The adaptive group Lasso helps us select the valid instruments while the dimensionality of potential instrumental variables is allowed to be greater than the sample size. In practice, the degree and knots of B-spline series are selected by minimizing the BIC or EBIC criteria for each nonparametric additive component in the reduced form equation. In Monte Carlo simulations, we show that the NAIVE has the same performance as the linear instrumental variable (IV) estimator for the truly linear reduced form equation. On the other hand, the NAIVE performs much better in terms of bias and mean squared errors compared to other alternative estimators under the high-dimensional nonlinear reduced form equation. We further illustrate our method in an empirical study of international trade and growth. Our findings provide a stronger evidence that international trade has a significant positive effect on economic growth.  相似文献   

13.
14.
In this paper, we propose the hard thresholding regression (HTR) for estimating high‐dimensional sparse linear regression models. HTR uses a two‐stage convex algorithm to approximate the ?0‐penalized regression: The first stage calculates a coarse initial estimator, and the second stage identifies the oracle estimator by borrowing information from the first one. Theoretically, the HTR estimator achieves the strong oracle property over a wide range of regularization parameters. Numerical examples and a real data example lend further support to our proposed methodology.  相似文献   

15.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

16.
The authors consider the problem of estimating the density g of independent and identically distributed variables XI, from a sample Z1,… Zn such that ZI = XI + σ? for i = 1,…, n, and E is noise independent of X, with σ? having a known distribution. They present a model selection procedure allowing one to construct an adaptive estimator of g and to find nonasymptotic risk bounds. The estimator achieves the minimax rate of convergence, in most cases where lower bounds are available. A simulation study gives an illustration of the good practical performance of the method.  相似文献   

17.
Summary.  Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data.  相似文献   

18.
In this article, we develop a generalized penalized linear unbiased selection (GPLUS) algorithm. The GPLUS is designed to compute the paths of penalized logistic regression based on the smoothly clipped absolute deviation (SCAD) and the minimax concave penalties (MCP). The main idea of the GPLUS is to compute possibly multiple local minimizers at individual penalty levels by continuously tracing the minimizers at different penalty levels. We demonstrate the feasibility of the proposed algorithm in logistic and linear regression. The simulation results favor the SCAD and MCP’s selection accuracy encompassing a suitable range of penalty levels.  相似文献   

19.
20.
Nonparametric regression techniques such as spline smoothing and local fitting depend implicitly on a parametric model. For instance, the cubic smoothing spline estimate of a regression function ∫ μ based on observations ti, Yi is the minimizer of Σ{Yi ‐ μ(ti)}2 + λ∫(μ′′)2. Since ∫(μ″)2 is zero when μ is a line, the cubic smoothing spline estimate favors the parametric model μ(t) = αo + α1t. Here the authors consider replacing ∫(μ″)2 with the more general expression ∫(Lμ)2 where L is a linear differential operator with possibly nonconstant coefficients. The resulting estimate of μ performs well, particularly if Lμ is small. They present an O(n) algorithm for the computation of μ. This algorithm is applicable to a wide class of L's. They also suggest a method for the estimation of L. They study their estimates via simulation and apply them to several data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号