共查询到20条相似文献,搜索用时 10 毫秒
1.
We propose a data-driven method to select significant variables in additive model via spline estimation. The additive structure of the regression model is imposed to overcome the ‘curse of dimensionality’, while the spline estimators provide a good approximation to the additive components of the model. The additive components are ordered according to their empirical strengths, and the significant variables are chosen at the first crossing of a predetermined threshold by the CUmulative Ratios of Empirical Strengths Total of the components. Consistency of the proposed method is established when the number of variables are allowed to diverge with sample size, while extensive Monte-Carlo study demonstrates superior performance of the proposed method and its advantages over the BIC method of Huang and Yang [(2004), ‘Identification of Nonlinear: Additive Autoregressive Models’, Journal of the Royal Statistical Society Series B, 66, 463–477] in terms of speed and accuracy. 相似文献
2.
Variable selection in elliptical Linear Mixed Models (LMMs) with a shrinkage penalty function (SPF) is the main scope of this study. SPFs are applied for parameter estimation and variable selection simultaneously. The smoothly clipped absolute deviation penalty (SCAD) is one of the SPFs and it is adapted into the elliptical LMM in this study. The proposed idea is highly applicable to a variety of models which are set up with different distributions such as normal, student-t, Pearson VII, power exponential and so on. Simulation studies and real data example with one of the elliptical distributions show that if the variable selection is also a concern, it is worthwhile to carry on the variable selection and the parameter estimation simultaneously in the elliptical LMM. 相似文献
3.
Monte Carlo simulation is used to evaluate the actual confidence levels of five different approximations for confidence intervals for the probability of success in Markov dependent trials. The approximations involve the conditional probability of success as a nuisance parameter, and the effects of substituting Klotz's (1973), Price's (1976), and a new estimator are also evaluated. The new estimator is less biased and tends to increase the confidence level. A program for calculating the estimator and the confidence interval approximations is available. 相似文献
4.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes. 相似文献
5.
The beta regression models are commonly used by practitioners to model variables that assume values in the standard unit interval (0, 1). In this paper, we consider the issue of variable selection for beta regression models with varying dispersion (VBRM), in which both the mean and the dispersion depend upon predictor variables. Based on a penalized likelihood method, the consistency and the oracle property of the penalized estimators are established. Following the coordinate descent algorithm idea of generalized linear models, we develop new variable selection procedure for the VBRM, which can efficiently simultaneously estimate and select important variables in both mean model and dispersion model. Simulation studies and body fat data analysis are presented to illustrate the proposed methods. 相似文献
6.
In this paper, we consider the problem of variable selection for partially varying coefficient single-index model, and present a regularized variable selection procedure by combining basis function approximations with smoothly clipped absolute deviation penalty. The proposed procedure simultaneously selects significant variables in the single-index parametric components and the nonparametric coefficient function components. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Finite sample performance of the proposed method is illustrated by a simulation study and real data analysis. 相似文献
7.
《Journal of Statistical Computation and Simulation》2012,82(12):1983-1992
Using Cox regression as the main platform, we study the ensemble approach for variable selection. We use a popular real-data example as well as simulated data with various censoring levels to illustrate the usefulness of the ensemble approach, and study the nature of these ensembles in terms of their strength and diversity. By relating these characteristics to the ensemble's selection accuracy, we provide useful insights for how to choose among different ensemble strategies, as well as guidelines for thinking about how to design more effective ensembles. 相似文献
8.
9.
Chun-Xia Zhang Jiang-She Zhang Guan-Wei Wang Nan-Nan Ji 《Journal of applied statistics》2018,45(10):1734-1755
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods. 相似文献
10.
《Journal of Statistical Computation and Simulation》2012,82(8):1654-1669
In this paper, we focus on the variable selection for the semiparametric regression model with longitudinal data when some covariates are measured with errors. A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations. With appropriate selection of the tuning parameters, we establish the consistency and asymptotic normality of the resulting estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. We further illustrate the proposed procedure with an application. 相似文献
11.
Ping Zeng Yongyue Wei Yang Zhao Jin Liu Liya Liu Ruyang Zhang 《Journal of applied statistics》2014,41(4):879-894
This article proposes a variable selection approach for zero-inflated count data analysis based on the adaptive lasso technique. Two models including the zero-inflated Poisson and the zero-inflated negative binomial are investigated. An efficient algorithm is used to minimize the penalized log-likelihood function in an approximate manner. Both the generalized cross-validation and Bayesian information criterion procedures are employed to determine the optimal tuning parameter, and a consistent sandwich formula of standard errors for nonzero estimates is given based on local quadratic approximation. We evaluate the performance of the proposed adaptive lasso approach through extensive simulation studies, and apply it to analyze real-life data about doctor visits. 相似文献
12.
In survival studies, current status data are frequently encountered when some individuals in a study are not successively observed. This paper considers the problem of simultaneous variable selection and parameter estimation in the high-dimensional continuous generalized linear model with current status data. We apply the penalized likelihood procedure with the smoothly clipped absolute deviation penalty to select significant variables and estimate the corresponding regression coefficients. With a proper choice of tuning parameters, the resulting estimator is shown to be a root n/pn-consistent estimator under some mild conditions. In addition, we show that the resulting estimator has the same asymptotic distribution as the estimator obtained when the true model is known. The finite sample behavior of the proposed estimator is evaluated through simulation studies and a real example. 相似文献
13.
Variable selection for semiparametric proportional hazards model under progressive Type-II censoring
Variable selection is an effective methodology for dealing with models with numerous covariates. We consider the methods of variable selection for semiparametric Cox proportional hazards model under the progressive Type-II censoring scheme. The Cox proportional hazards model is used to model the influence coefficients of the environmental covariates. By applying Breslow’s “least information” idea, we obtain a profile likelihood function to estimate the coefficients. Lasso-type penalized profile likelihood estimation as well as stepwise variable selection method are explored as means to find the important covariates. Numerical simulations are conducted and Veteran’s Administration Lung Cancer data are exploited to evaluate the performance of the proposed method. 相似文献
14.
We propose a new algorithm for simultaneous variable selection and parameter estimation for the single-index quantile regression (SIQR) model . The proposed algorithm, which is non iterative , consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and , using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method consistently estimates the relevant variables , and the estimated parametric component derived in Step 2 satisfies the oracle property. 相似文献
15.
Xinyang Wang 《Australian & New Zealand Journal of Statistics》2020,62(2):278-295
In recent years, modelling count data has become one of the most important and popular topics in time‐series analysis. At the same time, variable selection methods have become widely used in many fields as an effective statistical modelling tool. In this paper, we consider using a variable selection method to solve a modelling problem regarding the first‐order Poisson integer‐valued autoregressive (PINAR(1)) model with covariables. The PINAR(1) model with covariables is widely used in many areas because of its practicality. When using this model to deal with practical problems, multiple covariables are added to the model because it is impossible to know in advance which covariables will affect the results. But the inclusion of some insignificant covariables is almost impossible to avoid. Unfortunately, the usual estimation method is not adequate for the task of deleting the insignificant covariables that cause statistical inferences to become biased. To overcome this defect, we propose a penalised conditional least squares (PCLS) method, which can consistently select the true model. The PCLS estimator is also provided and its asymptotic properties are established. Simulation studies demonstrate that the PCLS method is effective for estimation and variable selection. One practical example is also presented to illustrate the practicability of the PCLS method. 相似文献
16.
AbstractIn this article, we focus on the variable selection for semiparametric varying coefficient partially linear model with response missing at random. Variable selection is proposed based on modal regression, where the non parametric functions are approximated by B-spline basis. The proposed procedure uses SCAD penalty to realize variable selection of parametric and nonparametric components simultaneously. Furthermore, we establish the consistency, the sparse property and asymptotic normality of the resulting estimators. The penalty estimation parameters value of the proposed method is calculated by EM algorithm. Simulation studies are carried out to assess the finite sample performance of the proposed variable selection procedure. 相似文献
17.
The problem of variable selection is considered in high-dimensional partial linear regression under some model allowing for possibly functional variable. The procedure studied is that of nonconcave-penalized least squares. It is shown the existence of a √n/sn-consistent estimator for the vector of pn linear parameters in the model, even when pn tends to ∞ as the sample size n increases (sn denotes the number of influential variables). An oracle property is also obtained for the variable selection method, and the nonparametric rate of convergence is stated for the estimator of the nonlinear functional component of the model. Finally, a simulation study illustrates the finite sample size performance of our procedure. 相似文献
18.
AIC and BIC based on either empirical likelihood (EAIC and EBIC) or Gaussian pseudo-likelihood (GAIC and GBIC) are proposed to select variables in longitudinal data analysis. Their performances are evaluated in the framework of the generalized estimating equations via intensive simulation studies. Our findings are: (i) GAIC and GBIC outperform other existing methods in selecting variables; (ii) EAIC and EBIC are effective in selecting covariates only when the working correlation structure is correctly specified; (iii) GAIC and GBIC perform well regardless the working correlation structure is correctly specified or not. A real dataset is also provided to illustrate the findings. 相似文献
19.
We consider variable selection in linear regression of geostatistical data that arise often in environmental and ecological studies. A penalized least squares procedure is studied for simultaneous variable selection and parameter estimation. Various penalty functions are considered including smoothly clipped absolute deviation. Asymptotic properties of penalized least squares estimates, particularly the oracle properties, are established, under suitable regularity conditions imposed on a random field model for the error process. Moreover, computationally feasible algorithms are proposed for estimating regression coefficients and their standard errors. Finite‐sample properties of the proposed methods are investigated in a simulation study and comparison is made among different penalty functions. The methods are illustrated by an ecological dataset of landcover in Wisconsin. The Canadian Journal of Statistics 37: 607–624; 2009 © 2009 Statistical Society of Canada 相似文献
20.
Variable selection is an important task in regression analysis. Performance of the statistical model highly depends on the determination of the subset of predictors. There are several methods to select most relevant variables to construct a good model. However in practice, the dependent variable may have positive continuous values and not normally distributed. In such situations, gamma distribution is more suitable than normal for building a regression model. This paper introduces an heuristic approach to perform variable selection using artificial bee colony optimization for gamma regression models. We evaluated the proposed method against with classical selection methods such as backward and stepwise. Both simulation studies and real data set examples proved the accuracy of our selection procedure. 相似文献