We propose a new adaptive L1 penalized quantile regression estimator for high-dimensional sparse regression models with heterogeneous error sequences. We show that under weaker conditions compared with alternative procedures, the adaptive L1 quantile regression selects the true underlying model with probability converging to one, and the unique estimates of nonzero coefficients it provides have the same asymptotic normal distribution as the quantile estimator which uses only the covariates with non-zero impact on the response. Thus, the adaptive L1 quantile regression enjoys oracle properties. We propose a completely data driven choice of the penalty level λnλn, which ensures good performance of the adaptive L1 quantile regression. Extensive Monte Carlo simulation studies have been conducted to demonstrate the finite sample performance of the proposed method.  相似文献   

In many regression problems, predictors are naturally grouped. For example, when a set of dummy variables is used to represent categorical variables, or a set of basis functions of continuous variables is included in the predictor set, it is important to carry out a feature selection both at the group level and at individual variable levels within the group simultaneously. To incorporate the group and variables within-group information into a regularized model fitting, several regularization methods have been developed, including the Cox regression and the conditional mean regression. Complementary to earlier works, the simultaneous group and within-group variables selection method is examined in quantile regression. We propose a hierarchically penalized quantile regression, and show that the hierarchical penalty possesses the oracle property in quantile regression, as well as in the Cox regression. The proposed method is evaluated through simulation studies and a real data application.  相似文献   

In this article, a new efficient iteration procedure based on quantile regression is developed for single-index varying-coefficient models. The proposed estimation scheme is an extension of the full iteration procedure proposed by Carroll et al., which is different with the method adopted by Wu et al. for single-index models that a double-weighted summation is used therein. This distinguish not only be the reason that undersmoothing should be a necessary condition in our proposed procedure, but also may reduce the computational burden especially for large-sample size. The resulting estimators are shown to be robust with regardless of outliers as well as varying errors. Moreover, to achieve sparsity when there exist irrelevant variables in the index parameters, a variable selection procedure combined with adaptive LASSO penalty is developed to simultaneously select and estimate significant parameters. Theoretical properties of the obtained estimators are established under some regular conditions, and some simulation studies with various distributed errors are conducted to assess the finite sample performance of our proposed method.  相似文献   

The group folded concave penalization problems have been shown to process the satisfactory oracle property theoretically. However, it remains unknown whether the optimization algorithm for solving the resulting nonconvex problem can find such oracle solution among multiple local solutions. In this paper, we extend the well-known local linear approximation (LLA) algorithm to solve the group folded concave penalization problem for the linear models. We prove that, with the group LASSO estimator as the initial value, the two-step LLA solution converges to the oracle estimator with overwhelming probability, and thus closing the theoretical gap. The results are high-dimensional which allow the group number to grow exponentially, the true relevant groups and the true maximum group size to grow polynomially. Numerical studies are also conducted to show the merits of the LLA procedure.  相似文献   

In this paper, we consider the prediction problem in multiple linear regression model in which the number of predictor variables, p, is extremely large compared to the number of available observations, n  . The least-squares predictor based on a generalized inverse is not efficient. We propose six empirical Bayes estimators of the regression parameters. Three of them are shown to have uniformly lower prediction error than the least-squares predictors when the vector of regressor variables are assumed to be random with mean vector zero and the covariance matrix (1/n)XtX(1/n)XtX where Xt=(x1,…,xn)Xt=(x1,,xn) is the p×np×n matrix of observations on the regressor vector centered from their sample means. For other estimators, we use simulation to show its superiority over the least-squares predictor.  相似文献   

We propose a penalized quantile regression for partially linear varying coefficient (VC) model with longitudinal data to select relevant non parametric and parametric components simultaneously. Selection consistency and oracle property are established. Furthermore, if linear part and VC part are unknown, we propose a new unified method, which can do three types of selections: separation of varying and constant effects, selection of relevant variables, and it can be carried out conveniently in one step. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method.  相似文献   

Based on the inverse probability weight method, we, in this article, construct the empirical likelihood (EL) and penalized empirical likelihood (PEL) ratios of the parameter in the linear quantile regression model when the covariates are missing at random, in the presence and absence of auxiliary information, respectively. It is proved that the EL ratio admits a limiting Chi-square distribution. At the same time, the asymptotic normality of the maximum EL and PEL estimators of the parameter is established. Also, the variable selection of the model in the presence and absence of auxiliary information, respectively, is discussed. Simulation study and a real data analysis are done to evaluate the performance of the proposed methods.  相似文献   

In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L2 penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution.  相似文献   

This paper considers a problem of variable selection in quantile regression with autoregressive errors. Recently, Wu and Liu (2009) investigated the oracle properties of the SCAD and adaptive-LASSO penalized quantile regressions under non identical but independent error assumption. We further relax the error assumptions so that the regression model can hold autoregressive errors, and then investigate theoretical properties for our proposed penalized quantile estimators under the relaxed assumption. Optimizing the objective function is often challenging because both quantile loss and penalty functions may be non-differentiable and/or non-concave. We adopt the concept of pseudo data by Oh et al. (2007) to implement a practical algorithm for the quantile estimate. In addition, we discuss the convergence property of the proposed algorithm. The performance of the proposed method is compared with those of the majorization-minimization algorithm (Hunter and Li, 2005) and the difference convex algorithm (Wu and Liu, 2009) through numerical and real examples.  相似文献   

We consider variable selection in linear regression of geostatistical data that arise often in environmental and ecological studies. A penalized least squares procedure is studied for simultaneous variable selection and parameter estimation. Various penalty functions are considered including smoothly clipped absolute deviation. Asymptotic properties of penalized least squares estimates, particularly the oracle properties, are established, under suitable regularity conditions imposed on a random field model for the error process. Moreover, computationally feasible algorithms are proposed for estimating regression coefficients and their standard errors. Finite‐sample properties of the proposed methods are investigated in a simulation study and comparison is made among different penalty functions. The methods are illustrated by an ecological dataset of landcover in Wisconsin. The Canadian Journal of Statistics 37: 607–624; 2009 © 2009 Statistical Society of Canada  相似文献   

The penalized logistic regression (PLR) is a powerful statistical tool for classification. It has been commonly used in many practical problems. Despite its success, since the loss function of the PLR is unbounded, resulting classifiers can be sensitive to outliers. To build more robust classifiers, we propose the robust PLR (RPLR) which uses truncated logistic loss functions, and suggest three schemes to estimate conditional class probabilities. Connections of the RPLR with some other existing work on robust logistic regression have been discussed. Our theoretical results indicate that the RPLR is Fisher consistent and more robust to outliers. Moreover, we develop estimated generalized approximate cross validation (EGACV) for the tuning parameter selection. Through numerical examples, we demonstrate that truncating the loss function indeed yields better performance in terms of classification accuracy and class probability estimation.  相似文献   

Bridge penalized regression has many desirable statistical properties such as unbiasedness, sparseness as well as ‘oracle’. In Bayesian framework, bridge regularized penalty can be implemented based on generalized Gaussian distribution (GGD) prior. In this paper, we incorporate Bayesian bridge-randomized penalty and its adaptive version into the quantile regression (QR) models with autoregressive perturbations to conduct Bayesian penalization estimation. Employing the working likelihood of the asymmetric Laplace distribution (ALD) perturbations, the Bayesian joint hierarchical models are established. Based on the mixture representations of the ALD and generalized Gaussian distribution (GGD) priors of coefficients, the hybrid algorithms based on Gibbs sampler and Metropolis-Hasting sampler are provided to conduct fully Bayesian posterior estimation. Finally, the proposed Bayesian procedures are illustrated by some simulation examples and applied to a real data application of the electricity consumption.  相似文献   

High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.  相似文献   

In this paper we consider the statistical analysis of multivariate multiple nonlinear regression models with correlated errors, using Finite Fourier Transforms. Consistency and asymptotic normality of the weighted least squares estimates are established under various conditions on the regressor variables. These conditions involve different types of scalings, and the scaling factors are obtained explicitly for various types of nonlinear regression models including an interesting model which requires the estimation of unknown frequencies. The estimation of frequencies is a classical problem occurring in many areas like signal processing, environmental time series, astronomy and other areas of physical sciences. We illustrate our methodology using two real data sets taken from geophysics and environmental sciences. The data we consider from geophysics are polar motion (which is now widely known as “Chandlers Wobble”), where one has to estimate the drift parameters, the offset parameters and the two periodicities associated with elliptical motion. The data were first analyzed by Arato, Kolmogorov and Sinai who treat it as a bivariate time series satisfying a finite order time series model. They estimate the periodicities using the coefficients of the fitted models. Our analysis shows that the two dominant frequencies are 12 h and 410 days. The second example, we consider is the minimum/maximum monthly temperatures observed at the Antarctic Peninsula (Faraday/Vernadsky station). It is now widely believed that over the past 50 years there is a steady warming in this region, and if this is true, the warming has serious consequences on ecology, marine life, etc. as it can result in melting of ice shelves and glaciers. Our objective here is to estimate any existing temperature trend in the data, and we use the nonlinear regression methodology developed here to achieve that goal.  相似文献   

The high-dimensional data arises in diverse fields of sciences, engineering and humanities. Variable selection plays an important role in dealing with high dimensional statistical modelling. In this article, we study the variable selection of quadratic approximation via the smoothly clipped absolute deviation (SCAD) penalty with a diverging number of parameters. We provide a unified method to select variables and estimate parameters for various of high dimensional models. Under appropriate conditions and with a proper regularization parameter, we show that the estimator has consistency and sparsity, and the estimators of nonzero coefficients enjoy the asymptotic normality as they would have if the zero coefficients were known in advance. In addition, under some mild conditions, we can obtain the global solution of the penalized objective function with the SCAD penalty. Numerical studies and a real data analysis are carried out to confirm the performance of the proposed method.  相似文献   

In high-dimensional regression problems regularization methods have been a popular choice to address variable selection and multicollinearity. In this paper we study bridge regression that adaptively selects the penalty order from data and produces flexible solutions in various settings. We implement bridge regression based on the local linear and quadratic approximations to circumvent the nonconvex optimization problem. Our numerical study shows that the proposed bridge estimators are a robust choice in various circumstances compared to other penalized regression methods such as the ridge, lasso, and elastic net. In addition, we propose group bridge estimators that select grouped variables and study their asymptotic properties when the number of covariates increases along with the sample size. These estimators are also applied to varying-coefficient models. Numerical examples show superior performances of the proposed group bridge estimators in comparisons with other existing methods.  相似文献   

In this paper, a new estimation procedure based on composite quantile regression and functional principal component analysis (PCA) method is proposed for the partially functional linear regression models (PFLRMs). The proposed estimation method can simultaneously estimate both the parametric regression coefficients and functional coefficient components without specification of the error distributions. The proposed estimation method is shown to be more efficient empirically for non-normal random error, especially for Cauchy error, and almost as efficient for normal random errors. Furthermore, based on the proposed estimation procedure, we use the penalized composite quantile regression method to study variable selection for parametric part in the PFLRMs. Under certain regularity conditions, consistency, asymptotic normality, and Oracle property of the resulting estimators are derived. Simulation studies and a real data analysis are conducted to assess the finite sample performance of the proposed methods.  相似文献   

In a regression model with univariate response, the quantities derived from the least-absolute-deviations method need not be unique. In this note, we show that, contrary to the univariate case, in a regression model with multivariate response, the least-distances method typically yields quantities that exhibit uniqueness properties that are similar to those obtained by the least-squares method.  相似文献   

