首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

There has been much attention on the high-dimensional linear regression models, which means the number of observations is much less than that of covariates. Considering the fact that the high dimensionality often induces the collinearity problem, in this article, we study the penalized quantile regression with the elastic net (EnetQR) that combines the strengths of the quadratic regularization and the lasso shrinkage. We investigate the weak oracle property of the EnetQR under mild conditions in the high dimensional setting. Moreover, we propose a two-step procedure, called adaptive elastic net quantile regression (AEnetQR), in which the weight vector in the second step is constructed from the EnetQR estimate in the first step. This two-step procedure is justified theoretically to possess the weak oracle property. The finite sample properties are performed through the Monte Carlo simulation and a real-data analysis.  相似文献   

2.
For stepwise regression and discriminant analysis the parameters F in and F out govern the inclusion and deletion of variables. The candidate variable with the biggest F—ratio is included if this exceeds F inthe included variable with the smallest F—ratio is deleted if this is less than F in If F inF out; then return to a previous subset size implies improvement in the criterion measure. This result also holds for a generalization, stepwise multivariate analysis, which includes stepwise regression and discriminant analysis as special cases

Eliminations do not occur if forward regression and backward elimination yield the same sequence of subsets. Conversely, there is a more liberal stepping rule which always eliminates if the two sequences differ.  相似文献   

3.
This paper is concerned with selection of explanatory variables in generalized linear models (GLM). The class of GLM's is quite large and contains e.g. the ordinary linear regression, the binary logistic regression, the probit model and Poisson regression with linear or log-linear parameter structure. We show that, through an approximation of the log likelihood and a certain data transformation, the variable selection problem in a GLM can be converted into variable selection in an ordinary (unweighted) linear regression model. As a consequence no specific computer software for variable selection in GLM's is needed. Instead, some suitable variable selection program for linear regression can be used. We also present a simulation study which shows that the log likelihood approximation is very good in many practical situations. Finally, we mention briefly possible extensions to regression models outside the class of GLM's.  相似文献   

4.
In this paper, we propose a new full iteration estimation method for quantile regression (QR) of the single-index model (SIM). The asymptotic properties of the proposed estimator are derived. Furthermore, we propose a variable selection procedure for the QR of SIM by combining the estimation method with the adaptive LASSO penalized method to get sparse estimation of the index parameter. The oracle properties of the variable selection method are established. Simulations with various non-normal errors are conducted to demonstrate the finite sample performance of the estimation method and the variable selection procedure. Furthermore, we illustrate the proposed method by analyzing a real data set.  相似文献   

5.
This paper presents a Bayesian technique for the estimation of a logistic regression model including variable selection. As in Ou & Penman (1989), the model is used to predict the direction of company earnings, one year ahead, from a large set of accounting variables from financial statements. To estimate the model, the paper presents a Markov chain Monte Carlo sampling scheme that includes the variable selection technique of Smith & Kohn (1996) and the non-Gaussian estimation method of Mira & Tierney (2001). The technique is applied to data for companies in the United States and Australia. The results obtained compare favourably to the technique used by Ou & Penman (1989) for both regions.  相似文献   

6.
In this paper, we propose a new estimation method for binary quantile regression and variable selection which can be implemented by an iteratively reweighted least square approach. In contrast to existing approaches, this method is computationally simple, guaranteed to converge to a unique solution and implemented with standard software packages. We demonstrate our methods using Monte-Carlo experiments and then we apply the proposed method to the widely used work trip mode choice dataset. The results indicate that the proposed estimators work well in finite samples.  相似文献   

7.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

8.
Our objective is to modify a robust coefficient of determination for the minimum sum of absolute errors MSAE regression proposed by McKean and Sievers (1987) so that it satisfies all the desirable properties. We also propose an adjusted coefficient of determination that is appropriate for comparing several models with different number of variables. Further, it has the property that if it decreases with the addition of predictor variables to the model, then the contribution of these variables is statistically non-significant. We illustrate the results with an example.  相似文献   

9.
ABSTRACT

This note presents an approximation to multivariate regression models which is obtained from a first-order series expansion of the multivariate link function. The proposed approach yields a variable-addition approximation of regression models that enables a multivariate generalization of the well-known goodness-of-link specification test, available for univariate generalized linear models. Application of this general methodology is illustrated with models of multinomial discrete choice and multivariate fractional data, in which context it is shown to lead to well-established approximation and testing procedures.  相似文献   

10.
The paper considers the problem of consistent variable selection with the use of stepdown procedures in the classical linear regression model and for the model with dependent errors. The stated results complete the results obtained by Bunea et al. [Consistent variable selection in high dimensional regression via multiple testing. J Stat Plann Inference. 2006;136(12):4349–4364].  相似文献   

11.
From the prediction viewpoint, mode regression is more attractive since it pay attention to the most probable value of response variable given regressors. On the other hand, high-dimensional data are very prevalent as the advance of the technology of collecting and storing data. Variable selection is an important strategy to deal with high-dimensional regression problem. This paper aims to propose a variable selection procedure for high-dimensional mode regression via combining nonparametric kernel estimation method with sparsity penalty tactics. We also establish the asymptotic properties under certain technical conditions. The effectiveness and flexibility of the proposed methods are further illustrated by numerical studies and the real data application.  相似文献   

12.
Summary.  The family of inverse regression estimators that was recently proposed by Cook and Ni has proven effective in dimension reduction by transforming the high dimensional predictor vector to its low dimensional projections. We propose a general shrinkage estimation strategy for the entire inverse regression estimation family that is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimators achieve consistency in variable selection without requiring any traditional model, meanwhile retaining the root n estimation consistency of the dimension reduction basis. We also show the effectiveness of the new estimators through both simulation and real data analysis.  相似文献   

13.
The performances of data-driven bandwidth selection procedures in local polynomial regression are investigated by using asymptotic methods and simulation. The bandwidth selection procedures considered are based on minimizing 'prelimit' approximations to the (conditional) mean-squared error (MSE) when the MSE is considered as a function of the bandwidth h . We first consider approximations to the MSE that are based on Taylor expansions around h=0 of the bias part of the MSE. These approximations lead to estimators of the MSE that are accurate only for small bandwidths h . We also consider a bias estimator which instead of using small h approximations to bias naïvely estimates bias as the difference of two local polynomial estimators of different order and we show that this estimator performs well only for moderate to large h . We next define a hybrid bias estimator which equals the Taylor-expansion-based estimator for small h and the difference estimator for moderate to large h . We find that the MSE estimator based on this hybrid bias estimator leads to a bandwidth selection procedure with good asymptotic and, for our Monte Carlo examples, finite sample properties.  相似文献   

14.
The aim of this paper is to explore variable selection approaches in the partially linear proportional hazards model for multivariate failure time data. A new penalised pseudo-partial likelihood method is proposed to select important covariates. Under certain regularity conditions, we establish the rate of convergence and asymptotic normality of the resulting estimates. We further show that the proposed procedure can correctly select the true submodel, as if it was known in advance. Both simulated and real data examples are presented to illustrate the proposed methodology.  相似文献   

15.
16.
Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation.  相似文献   

17.
18.
A growth curve analysis is often applied to estimate patterns of changes in a given characteristic of different individuals. It is also used to find out if the variations in the growth rates among individuals are due to effects of certain covariates. In this paper, a random coefficient linear regression model, as a special case of the growth curve analysis, is generalized to accommodate the situation where the set of influential covariates is not known a priori. Two different approaches for seleaing influential covariates (a weighted stepwise selection procedure and a modified version of Rao and Wu’s selection criterion) for the random slope coefficient of a linear regression model with unbalanced data are proposed. Performances of these methods are evaluated by means of Monte-Carlo simulation. In addition, several methods (Maximum Likelihood, Restricted Maximum Likelihood, Pseudo Maximum Likelihood and Method of Moments) for estimating the parameters of the selected model are compared Proposed variable selection schemes and estimators are appliedtotheactualindustrial problem which motivated this investigation.  相似文献   

19.
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.  相似文献   

20.
We consider a regression analysis of multivariate response on a vector of predictors. In this article, we develop a sliced inverse regression-based method for reducing the dimension of predictors without requiring a prespecified parametric model. Our proposed method preserves as much regression information as possible. We derive the asymptotic weighted chi-squared test for dimension. Simulation results are reported and comparisons are made with three methods—most predictable variates, k-means inverse regression and canonical correlation approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号