共查询到20条相似文献,搜索用时 17 毫秒
1.
《Journal of Statistical Computation and Simulation》2012,82(3-4):177-185
For stepwise regression and discriminant analysis the parameters F in and F out govern the inclusion and deletion of variables. The candidate variable with the biggest F—ratio is included if this exceeds F inthe included variable with the smallest F—ratio is deleted if this is less than F in If F in ≧F out; then return to a previous subset size implies improvement in the criterion measure. This result also holds for a generalization, stepwise multivariate analysis, which includes stepwise regression and discriminant analysis as special cases Eliminations do not occur if forward regression and backward elimination yield the same sequence of subsets. Conversely, there is a more liberal stepping rule which always eliminates if the two sequences differ. 相似文献
2.
Ailing Yan 《统计学通讯:理论与方法》2013,42(20):5106-5120
AbstractThere has been much attention on the high-dimensional linear regression models, which means the number of observations is much less than that of covariates. Considering the fact that the high dimensionality often induces the collinearity problem, in this article, we study the penalized quantile regression with the elastic net (EnetQR) that combines the strengths of the quadratic regularization and the lasso shrinkage. We investigate the weak oracle property of the EnetQR under mild conditions in the high dimensional setting. Moreover, we propose a two-step procedure, called adaptive elastic net quantile regression (AEnetQR), in which the weight vector in the second step is constructed from the EnetQR estimate in the first step. This two-step procedure is justified theoretically to possess the weak oracle property. The finite sample properties are performed through the Monte Carlo simulation and a real-data analysis. 相似文献
3.
Lennart Nordberg 《统计学通讯:理论与方法》2013,42(21):2427-2449
This paper is concerned with selection of explanatory variables in generalized linear models (GLM). The class of GLM's is quite large and contains e.g. the ordinary linear regression, the binary logistic regression, the probit model and Poisson regression with linear or log-linear parameter structure. We show that, through an approximation of the log likelihood and a certain data transformation, the variable selection problem in a GLM can be converted into variable selection in an ordinary (unweighted) linear regression model. As a consequence no specific computer software for variable selection in GLM's is needed. Instead, some suitable variable selection program for linear regression can be used. We also present a simulation study which shows that the log likelihood approximation is very good in many practical situations. Finally, we mention briefly possible extensions to regression models outside the class of GLM's. 相似文献
4.
In this paper, we propose a new full iteration estimation method for quantile regression (QR) of the single-index model (SIM). The asymptotic properties of the proposed estimator are derived. Furthermore, we propose a variable selection procedure for the QR of SIM by combining the estimation method with the adaptive LASSO penalized method to get sparse estimation of the index parameter. The oracle properties of the variable selection method are established. Simulations with various non-normal errors are conducted to demonstrate the finite sample performance of the estimation method and the variable selection procedure. Furthermore, we illustrate the proposed method by analyzing a real data set. 相似文献
5.
6.
Richard Gerlach Ron Bird & Anthony Hall 《Australian & New Zealand Journal of Statistics》2002,44(2):155-168
This paper presents a Bayesian technique for the estimation of a logistic regression model including variable selection. As in Ou & Penman (1989), the model is used to predict the direction of company earnings, one year ahead, from a large set of accounting variables from financial statements. To estimate the model, the paper presents a Markov chain Monte Carlo sampling scheme that includes the variable selection technique of Smith & Kohn (1996) and the non-Gaussian estimation method of Mira & Tierney (2001). The technique is applied to data for companies in the United States and Australia. The results obtained compare favourably to the technique used by Ou & Penman (1989) for both regions. 相似文献
7.
In this paper, we propose a new estimation method for binary quantile regression and variable selection which can be implemented by an iteratively reweighted least square approach. In contrast to existing approaches, this method is computationally simple, guaranteed to converge to a unique solution and implemented with standard software packages. We demonstrate our methods using Monte-Carlo experiments and then we apply the proposed method to the widely used work trip mode choice dataset. The results indicate that the proposed estimators work well in finite samples. 相似文献
8.
In this paper, we develop a new estimation procedure based on quantile regression for semiparametric partially linear varying-coefficient models. The proposed estimation approach is empirically shown to be much more efficient than the popular least squares estimation method for non-normal error distributions, and almost not lose any efficiency for normal errors. Asymptotic normalities of the proposed estimators for both the parametric and nonparametric parts are established. To achieve sparsity when there exist irrelevant variables in the model, two variable selection procedures based on adaptive penalty are developed to select important parametric covariates as well as significant nonparametric functions. Moreover, both these two variable selection procedures are demonstrated to enjoy the oracle property under some regularity conditions. Some Monte Carlo simulations are conducted to assess the finite sample performance of the proposed estimators, and a real-data example is used to illustrate the application of the proposed methods. 相似文献
9.
Structure identification and variable selection in geographically weighted regression models 总被引:1,自引:0,他引:1
Geographically weighted regression (GWR) is an important tool for exploring spatial non-stationarity of a regression relationship, in which whether a regression coefficient really varies over space is especially important in drawing valid conclusions on the spatial variation characteristics of the regression relationship. This paper proposes a so-called GWGlasso method for structure identification and variable selection in GWR models. This method penalizes the loss function of the local-linear estimation of the GWR model by the coefficients and their partial derivatives in the way of the adaptive group lasso and can simultaneously identify spatially varying coefficients, nonzero constant coefficients and zero coefficients. Simulation experiments are further conducted to assess the performance of the proposed method and the Dublin voter turnout data set is analysed to demonstrate its application. 相似文献
10.
Conglian Yu 《统计学通讯:理论与方法》2020,49(18):4347-4366
AbstractIn this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis. 相似文献
11.
Chun-Xia Zhang Jiang-She Zhang Guan-Wei Wang Nan-Nan Ji 《Journal of applied statistics》2018,45(10):1734-1755
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods. 相似文献
12.
Carmen D.S. André Silvia N. Elian Subhash C. Narula Rodrigo A. Tavares 《统计学通讯:理论与方法》2013,42(3):623-642
Our objective is to modify a robust coefficient of determination for the minimum sum of absolute errors MSAE regression proposed by McKean and Sievers (1987) so that it satisfies all the desirable properties. We also propose an adjusted coefficient of determination that is appropriate for comparing several models with different number of variables. Further, it has the property that if it decreases with the addition of predictor variables to the model, then the contribution of these variables is statistically non-significant. We illustrate the results with an example. 相似文献
13.
José M. R. Murteira 《统计学通讯:理论与方法》2013,42(24):7367-7375
ABSTRACTThis note presents an approximation to multivariate regression models which is obtained from a first-order series expansion of the multivariate link function. The proposed approach yields a variable-addition approximation of regression models that enables a multivariate generalization of the well-known goodness-of-link specification test, available for univariate generalized linear models. Application of this general methodology is illustrated with models of multinomial discrete choice and multivariate fractional data, in which context it is shown to lead to well-established approximation and testing procedures. 相似文献
14.
Konrad Furmańczyk 《Statistics》2015,49(3):614-628
The paper considers the problem of consistent variable selection with the use of stepdown procedures in the classical linear regression model and for the model with dependent errors. The stated results complete the results obtained by Bunea et al. [Consistent variable selection in high dimensional regression via multiple testing. J Stat Plann Inference. 2006;136(12):4349–4364]. 相似文献
15.
From the prediction viewpoint, mode regression is more attractive since it pay attention to the most probable value of response variable given regressors. On the other hand, high-dimensional data are very prevalent as the advance of the technology of collecting and storing data. Variable selection is an important strategy to deal with high-dimensional regression problem. This paper aims to propose a variable selection procedure for high-dimensional mode regression via combining nonparametric kernel estimation method with sparsity penalty tactics. We also establish the asymptotic properties under certain technical conditions. The effectiveness and flexibility of the proposed methods are further illustrated by numerical studies and the real data application. 相似文献
16.
Kjell Doksum Derick Peterson & Alex Samarov 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(3):431-448
The performances of data-driven bandwidth selection procedures in local polynomial regression are investigated by using asymptotic methods and simulation. The bandwidth selection procedures considered are based on minimizing 'prelimit' approximations to the (conditional) mean-squared error (MSE) when the MSE is considered as a function of the bandwidth h . We first consider approximations to the MSE that are based on Taylor expansions around h=0 of the bias part of the MSE. These approximations lead to estimators of the MSE that are accurate only for small bandwidths h . We also consider a bias estimator which instead of using small h approximations to bias naïvely estimates bias as the difference of two local polynomial estimators of different order and we show that this estimator performs well only for moderate to large h . We next define a hybrid bias estimator which equals the Taylor-expansion-based estimator for small h and the difference estimator for moderate to large h . We find that the MSE estimator based on this hybrid bias estimator leads to a bandwidth selection procedure with good asymptotic and, for our Monte Carlo examples, finite sample properties. 相似文献
17.
Howard D. Bondell Lexin Li 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):287-299
Summary. The family of inverse regression estimators that was recently proposed by Cook and Ni has proven effective in dimension reduction by transforming the high dimensional predictor vector to its low dimensional projections. We propose a general shrinkage estimation strategy for the entire inverse regression estimation family that is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimators achieve consistency in variable selection without requiring any traditional model, meanwhile retaining the root n estimation consistency of the dimension reduction basis. We also show the effectiveness of the new estimators through both simulation and real data analysis. 相似文献
18.
The aim of this paper is to explore variable selection approaches in the partially linear proportional hazards model for multivariate failure time data. A new penalised pseudo-partial likelihood method is proposed to select important covariates. Under certain regularity conditions, we establish the rate of convergence and asymptotic normality of the resulting estimates. We further show that the proposed procedure can correctly select the true submodel, as if it was known in advance. Both simulated and real data examples are presented to illustrate the proposed methodology. 相似文献
19.
20.
《Journal of Statistical Computation and Simulation》2012,82(1):95-106
Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation. 相似文献