首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 813 毫秒
1.
随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。  相似文献   

2.
High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.  相似文献   

3.
This article proposes a variable selection procedure for partially linear models with right-censored data via penalized least squares. We apply the SCAD penalty to select significant variables and estimate unknown parameters simultaneously. The sampling properties for the proposed procedure are investigated. The rate of convergence and the asymptotic normality of the proposed estimators are established. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property. In addition, an iterative algorithm is proposed to find the solution of the penalized least squares. Simulation studies are conducted to examine the finite sample performance of the proposed method.  相似文献   

4.
In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoothly Clipped Absolute Deviation (SCAD) penalty. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we propose distributed SCAD algorithm and prove its convergence. The results of variable selection of the distributed approach are same with the results of the non-distributed approach. Numerical studies show that our method is both effective and efficient which performs well in distributed data analysis.  相似文献   

5.
Abstract

In this article, we focus on the variable selection for semiparametric varying coefficient partially linear model with response missing at random. Variable selection is proposed based on modal regression, where the non parametric functions are approximated by B-spline basis. The proposed procedure uses SCAD penalty to realize variable selection of parametric and nonparametric components simultaneously. Furthermore, we establish the consistency, the sparse property and asymptotic normality of the resulting estimators. The penalty estimation parameters value of the proposed method is calculated by EM algorithm. Simulation studies are carried out to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

6.
Abstract

In this paper, we propose a variable selection method for quantile regression model in ultra-high dimensional longitudinal data called as the weighted adaptive robust lasso (WAR-Lasso) which is double-robustness. We derive the consistency and the model selection oracle property of WAR-Lasso. Simulation studies show the double-robustness of WAR-Lasso in both cases of heavy-tailed distribution of the errors and the heavy contaminations of the covariates. WAR-Lasso outperform other methods such as SCAD and etc. A real data analysis is carried out. It shows that WAR-Lasso tends to select fewer variables and the estimated coefficients are in line with economic significance.  相似文献   

7.
We propose the Laplace Error Penalty (LEP) function for variable selection in high‐dimensional regression. Unlike penalty functions using piecewise splines construction, the LEP is constructed as an exponential function with two tuning parameters and is infinitely differentiable everywhere except at the origin. With this construction, the LEP‐based procedure acquires extra flexibility in variable selection, admits a unified derivative formula in optimization and is able to approximate the L0 penalty as close as possible. We show that the LEP procedure can identify relevant predictors in exponentially high‐dimensional regression with normal errors. We also establish the oracle property for the LEP estimator. Although not being convex, the LEP yields a convex penalized least squares function under mild conditions if p is no greater than n. A coordinate descent majorization‐minimization algorithm is introduced to implement the LEP procedure. In simulations and a real data analysis, the LEP methodology performs favorably among competitive procedures.  相似文献   

8.
ABSTRACT

In this paper, we propose a new efficient and robust penalized estimating procedure for varying-coefficient single-index models based on modal regression and basis function approximations. The proposed procedure simultaneously solves two types of problems: separation of varying and constant effects and selection of variables with non zero coefficients for both non parametric and index components using three smoothly clipped absolute deviation (SCAD) penalties. With appropriate selection of the tuning parameters, the new method possesses the consistency in variable selection and the separation of varying and constant coefficients. In addition, the estimators of varying coefficients possess the optimal convergence rate and the estimators of constant coefficients and index parameters have the oracle property. Finally, we investigate the finite sample performance of the proposed method through a simulation study and real data analysis.  相似文献   

9.
In this article, we develop a generalized penalized linear unbiased selection (GPLUS) algorithm. The GPLUS is designed to compute the paths of penalized logistic regression based on the smoothly clipped absolute deviation (SCAD) and the minimax concave penalties (MCP). The main idea of the GPLUS is to compute possibly multiple local minimizers at individual penalty levels by continuously tracing the minimizers at different penalty levels. We demonstrate the feasibility of the proposed algorithm in logistic and linear regression. The simulation results favor the SCAD and MCP’s selection accuracy encompassing a suitable range of penalty levels.  相似文献   

10.
Based on B-spline basis functions and smoothly clipped absolute deviation (SCAD) penalty, we present a new estimation and variable selection procedure based on modal regression for partially linear additive models. The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions and performs no worse than the least-square-based estimation for normal error case. The main difference is that the standard quadratic loss is replaced by a kernel function depending on a bandwidth that can be automatically selected based on the observed data. With appropriate selection of the regularization parameters, the new method possesses the consistency in variable selection and oracle property in estimation. Finally, both simulation study and real data analysis are performed to examine the performance of our approach.  相似文献   

11.
In many conventional scientific investigations with high or ultra-high dimensional feature spaces, the relevant features, though sparse, are large in number compared with classical statistical problems, and the magnitude of their effects tapers off. It is reasonable to model the number of relevant features as a diverging sequence when sample size increases. In this paper, we investigate the properties of the extended Bayes information criterion (EBIC) (Chen and Chen, 2008) for feature selection in linear regression models with diverging number of relevant features in high or ultra-high dimensional feature spaces. The selection consistency of the EBIC in this situation is established. The application of EBIC to feature selection is considered in a SCAD cum EBIC procedure. Simulation studies are conducted to demonstrate the performance of the SCAD cum EBIC procedure in finite sample cases.  相似文献   

12.
A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.  相似文献   

13.
Liang H  Liu X  Li R  Tsai CL 《Annals of statistics》2010,38(6):3811-3836
In partially linear single-index models, we obtain the semiparametrically efficient profile least-squares estimators of regression coefficients. We also employ the smoothly clipped absolute deviation penalty (SCAD) approach to simultaneously select variables and estimate regression coefficients. We show that the resulting SCAD estimators are consistent and possess the oracle property. Subsequently, we demonstrate that a proposed tuning parameter selector, BIC, identifies the true model consistently. Finally, we develop a linear hypothesis test for the parametric coefficients and a goodness-of-fit test for the nonparametric component, respectively. Monte Carlo studies are also presented.  相似文献   

14.
In this article, we present a new efficient iteration estimation approach based on local modal regression for single-index varying-coefficient models. The resulted estimators are shown to be robust with regardless of outliers and error distributions. The asymptotic properties of the estimators are established under some regularity conditions and a practical modified EM algorithm is proposed for the new method. Moreover, to achieve sparse estimator when there exists irrelevant variables in the index parameters, a variable selection procedure based on SCAD penalty is developed to select significant parametric covariates and the well-known oracle properties are also derived. Finally, some numerical examples with various distributed errors and a real data analysis are conducted to illustrate the validity and feasibility of our proposed method.  相似文献   

15.
We propose two new procedures based on multiple hypothesis testing for correct support estimation in high‐dimensional sparse linear models. We conclusively prove that both procedures are powerful and do not require the sample size to be large. The first procedure tackles the atypical setting of ordered variable selection through an extension of a testing procedure previously developed in the context of a linear hypothesis. The second procedure is the main contribution of this paper. It enables data analysts to perform support estimation in the general high‐dimensional framework of non‐ordered variable selection. A thorough simulation study and applications to real datasets using the R package mht shows that our non‐ordered variable procedure produces excellent results in terms of correct support estimation as well as in terms of mean square errors and false discovery rate, when compared to common methods such as the Lasso, the SCAD penalty, forward regression or the false discovery rate procedure (FDR).  相似文献   

16.
In survival studies, current status data are frequently encountered when some individuals in a study are not successively observed. This paper considers the problem of simultaneous variable selection and parameter estimation in the high-dimensional continuous generalized linear model with current status data. We apply the penalized likelihood procedure with the smoothly clipped absolute deviation penalty to select significant variables and estimate the corresponding regression coefficients. With a proper choice of tuning parameters, the resulting estimator is shown to be a root n/pn-consistent estimator under some mild conditions. In addition, we show that the resulting estimator has the same asymptotic distribution as the estimator obtained when the true model is known. The finite sample behavior of the proposed estimator is evaluated through simulation studies and a real example.  相似文献   

17.
Partial linear models have been widely used as flexible method for modelling linear components in conjunction with non‐parametric ones. Despite the presence of the non‐parametric part, the linear, parametric part can under certain conditions be estimated with parametric rate. In this paper, we consider a high‐dimensional linear part. We show that it can be estimated with oracle rates, using the least absolute shrinkage and selection operator penalty for the linear part and a smoothness penalty for the nonparametric part.  相似文献   

18.
We study the estimation and variable selection for a partial linear single index model (PLSIM) when some linear covariates are not observed, but their ancillary variables are available. We use the semiparametric profile least-square based estimation procedure to estimate the parameters in the PLSIM after the calibrated error-prone covariates are obtained. Asymptotic normality for the estimators are established. We also employ the smoothly clipped absolute deviation (SCAD) penalty to select the relevant variables in the PLSIM. The resulting SCAD estimators are shown to be asymptotically normal and have the oracle property. Performance of our estimation procedure is illustrated through numerous simulations. The approach is further applied to a real data example.  相似文献   

19.
ABSTRACT

In this paper, we study a novelly robust variable selection and parametric component identification simultaneously in varying coefficient models. The proposed estimator is based on spline approximation and two smoothly clipped absolute deviation (SCAD) penalties through rank regression, which is robust with respect to heavy-tailed errors or outliers in the response. Furthermore, when the tuning parameter is chosen by modified BIC criterion, we show that the proposed procedure is consistent both in variable selection and the separation of varying and constant coefficients. In addition, the estimators of varying coefficients possess the optimal convergence rate under some assumptions, and the estimators of constant coefficients have the same asymptotic distribution as their counterparts obtained when the true model is known. Simulation studies and a real data example are undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

20.
In this paper, we discuss the selection of random effects within the framework of generalized linear mixed models (GLMMs). Based on a reparametrization of the covariance matrix of random effects in terms of modified Cholesky decomposition, we propose to add a shrinkage penalty term to the penalized quasi-likelihood (PQL) function of the variance components for selecting effective random effects. The shrinkage penalty term is taken as a function of the variance of random effects, initiated by the fact that if the variance is zero then the corresponding variable is no longer random (with probability one). The proposed method takes the advantage of a convenient computation for the PQL estimation and appealing properties for certain shrinkage penalty functions such as LASSO and SCAD. We propose to use a backfitting algorithm to estimate the fixed effects and variance components in GLMMs, which also selects effective random effects simultaneously. Simulation studies show that the proposed approach performs quite well in selecting effective random effects in GLMMs. Real data analysis is made using the proposed approach, too.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号