首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of recurrent event data. Recurrent event data often occur in long-term studies in which individuals may experience the events of interest more than once and their analysis has recently attracted a great deal of attention (Andersen et al., Statistical models based on counting processes, 1993; Cook and Lawless, Biometrics 52:1311–1323, 1996, The analysis of recurrent event data, 2007; Cook et al., Biometrics 52:557–571, 1996; Lawless and Nadeau, Technometrics 37:158-168, 1995; Lin et al., J R Stat Soc B 69:711–730, 2000). However, it seems that there are no established approaches to the variable selection with respect to recurrent event data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (J Am Stat Assoc 96:1348–1360, 2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel was known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. The proposed methodology is illustrated by using the data from a chronic granulomatous disease study.  相似文献   

2.
To perform variable selection in expectile regression, we introduce the elastic-net penalty into expectile regression and propose an elastic-net penalized expectile regression (ER-EN) model. We then adopt the semismooth Newton coordinate descent (SNCD) algorithm to solve the proposed ER-EN model in high-dimensional settings. The advantages of ER-EN model are illustrated via extensive Monte Carlo simulations. The numerical results show that the ER-EN model outperforms the elastic-net penalized least squares regression (LSR-EN), the elastic-net penalized Huber regression (HR-EN), the elastic-net penalized quantile regression (QR-EN) and conventional expectile regression (ER) in terms of variable selection and predictive ability, especially for asymmetric distributions. We also apply the ER-EN model to two real-world applications: relative location of CT slices on the axial axis and metabolism of tacrolimus (Tac) drug. Empirical results also demonstrate the superiority of the ER-EN model.  相似文献   

3.
This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more challenging in the presence of highly correlated predictors. Firstly we provide a theoretical study of the permutation importance measure for an additive regression model. This allows us to describe how the correlation between predictors impacts the permutation importance. Our results motivate the use of the recursive feature elimination (RFE) algorithm for variable selection in this context. This algorithm recursively eliminates the variables using permutation importance measure as a ranking criterion. Next various simulation experiments illustrate the efficiency of the RFE algorithm for selecting a small number of variables together with a good prediction error. Finally, this selection algorithm is tested on the Landsat Satellite data from the UCI Machine Learning Repository.  相似文献   

4.
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size.  相似文献   

5.
Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice.  相似文献   

6.
We consider a partially linear model with diverging number of groups of parameters in the parametric component. The variable selection and estimation of regression coefficients are achieved simultaneously by using the suitable penalty function for covariates in the parametric component. An MM-type algorithm for estimating parameters without inverting a high-dimensional matrix is proposed. The consistency and sparsity of penalized least-squares estimators of regression coefficients are discussed under the setting of some nonzero regression coefficients with very small values. It is found that the root pn/n-consistency and sparsity of the penalized least-squares estimators of regression coefficients cannot be given consideration simultaneously when the number of nonzero regression coefficients with very small values is unknown, where pn and n, respectively, denote the number of regression coefficients and sample size. The finite sample behaviors of penalized least-squares estimators of regression coefficients and the performance of the proposed algorithm are studied by simulation studies and a real data example.  相似文献   

7.
ABSTRACT

Identifying homogeneous subsets of predictors in classification can be challenging in the presence of high-dimensional data with highly correlated variables. We propose a new method called cluster correlation-network support vector machine (CCNSVM) that simultaneously estimates clusters of predictors that are relevant for classification and coefficients of penalized SVM. The new CCN penalty is a function of the well-known Topological Overlap Matrix whose entries measure the strength of connectivity between predictors. CCNSVM implements an efficient algorithm that alternates between searching for predictors’ clusters and optimizing a penalized SVM loss function using Majorization–Minimization tricks and a coordinate descent algorithm. This combining of clustering and sparsity into a single procedure provides additional insights into the power of exploring dimension reduction structure in high-dimensional binary classification. Simulation studies are considered to compare the performance of our procedure to its competitors. A practical application of CCNSVM on DNA methylation data illustrates its good behaviour.  相似文献   

8.
In panel data analysis, predictors may impact response in substantially different manner. Some predictors are in homogenous effects across all individuals, while the others are in heterogenous way. How to effectively differentiate these two kinds of predictors is crucial, particularly in high-dimensional panel data, since the number of parameters should be greatly reduced and hence lead to better interpretability by homogenous assumption. In this article, based on a hierarchical Bayesian panel regression model, we propose a novel yet effective Markov chain Monte Carlo (MCMC) algorithm together with a simple maximum ratio criterion to detect the predictors in homogenous effects in high-dimensional panel data. Extensive Monte Carlo simulations show that this MCMC algorithm performs well. The usefulness of the proposed method is further demonstrated by a real example from China financial market.  相似文献   

9.
Selection of appropriate predictors for right censored time to event data is very often encountered by the practitioners. We consider the ?1 penalized regression or “least absolute shrinkage and selection operator” as a tool for predictor selection in association with accelerated failure time model. The choice of the penalizing parameter λ is crucial to identify the correct set of covariates. In this paper, we propose an information theory-based method to choose λ under log-normal distribution. Furthermore, an efficient algorithm is discussed in the same context. The performance of the proposed λ and the algorithm is illustrated through simulation studies and a real data analysis. The convergence of the algorithm is also discussed.  相似文献   

10.
Abstract.  Variable selection is an important issue in all regression analyses, and in this paper we discuss this in the context of regression analysis of panel count data. Panel count data often occur in long-term studies that concern occurrence rate of a recurrent event, and their analysis has recently attracted a great deal of attention. However, there does not seem to exist any established approach for variable selection with respect to panel count data. For the problem, we adopt the idea behind the non-concave penalized likelihood approach and develop a non-concave penalized estimating function approach. The proposed methodology selects variables and estimates regression coefficients simultaneously, and an algorithm is presented for this process. We show that the proposed procedure performs as well as the oracle procedure in that it yields the estimates as if the correct submodel were known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. An illustrative example from a cancer study is provided.  相似文献   

11.
We consider the problem of variable selection in high-dimensional partially linear models with longitudinal data. A variable selection procedure is proposed based on the smooth-threshold generalized estimating equation (SGEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE. We establish the asymptotic properties in a high-dimensional framework where the number of covariates pn increases as the number of clusters n increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure.  相似文献   

12.
A new variable selection approach utilizing penalized estimating equations is developed for high-dimensional longitudinal data with dropouts under a missing at random (MAR) mechanism. The proposed method is based on the best linear approximation of efficient scores from the full dataset and does not need to specify a separate model for the missing or imputation process. The coordinate descent algorithm is adopted to implement the proposed method and is computational feasible and stable. The oracle property is established and extensive simulation studies show that the performance of the proposed variable selection method is much better than that of penalized estimating equations dealing with complete data which do not account for the MAR mechanism. In the end, the proposed method is applied to a Lifestyle Education for Activity and Nutrition study and the interaction effect between intervention and time is identified, which is consistent with previous findings.  相似文献   

13.
A challenging problem in the analysis of high-dimensional data is variable selection. In this study, we describe a bootstrap based technique for selecting predictors in partial least-squares regression (PLSR) and principle component regression (PCR) in high-dimensional data. Using a bootstrap-based technique for significance tests of the regression coefficients, a subset of the original variables can be selected to be included in the regression, thus obtaining a more parsimonious model with smaller prediction errors. We compare the bootstrap approach with several variable selection approaches (jack-knife and sparse formulation-based methods) on PCR and PLSR in simulation and real data.  相似文献   

14.
Principal fitted component (PFC) models are a class of likelihood-based inverse regression methods that yield a so-called sufficient reduction of the random p-vector of predictors X given the response Y. Assuming that a large number of the predictors has no information about Y, we aimed to obtain an estimate of the sufficient reduction that ‘purges’ these irrelevant predictors, and thus, select the most useful ones. We devised a procedure using observed significance values from the univariate fittings to yield a sparse PFC, a purged estimate of the sufficient reduction. The performance of the method is compared to that of penalized forward linear regression models for variable selection in high-dimensional settings.  相似文献   

15.
Variable and model selection problems are fundamental to high-dimensional statistical modeling in diverse fields of sciences. Especially in health studies, many potential factors are usually introduced to determine an outcome variable. This paper deals with the problem of high-dimensional statistical modeling through the analysis of the trauma annual data in Greece for 2005. The data set is divided into the experiment and control sets and consists of 6334 observations and 112 factors that include demographic, transport and intrahospital data used to detect possible risk factors of death. In our study, different model selection techniques are applied to the experiment set and the notion of deviance is used on the control set to assess the fit of the overall selected model. The statistical methods employed in this work were the non-concave penalized likelihood methods, smoothly clipped absolute deviation, least absolute shrinkage and selection operator, and Hard, the generalized linear logistic regression, and the best subset variable selection.The way of identifying the significant variables in large medical data sets along with the performance and the pros and cons of the various statistical techniques used are discussed. The performed analysis reveals the distinct advantages of the non-concave penalized likelihood methods over the traditional model selection techniques.  相似文献   

16.
Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.  相似文献   

17.
High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by applications in high-throughput genomic data analysis. In this paper, we propose a class of regularization methods, integrating both the penalized empirical likelihood and pseudoscore approaches, for variable selection and estimation in sparse and high-dimensional additive hazards regression models. When the number of covariates grows with the sample size, we establish asymptotic properties of the resulting estimator and the oracle property of the proposed method. It is shown that the proposed estimator is more efficient than that obtained from the non-concave penalized likelihood approach in the literature. Based on a penalized empirical likelihood ratio statistic, we further develop a nonparametric likelihood approach for testing the linear hypothesis of regression coefficients and constructing confidence regions consequently. Simulation studies are carried out to evaluate the performance of the proposed methodology and also two real data sets are analyzed.  相似文献   

18.
Huang J  Ma S  Li H  Zhang CH 《Annals of statistics》2011,39(4):2021-2046
We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with p ? n under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application.  相似文献   

19.
Sliced Inverse Regression (SIR; 1991) is a dimension reduction method for reducing the dimension of the predictors without losing regression information. The implementation of SIR requires inverting the covariance matrix of the predictors—which has hindered its use to analyze high-dimensional data where the number of predictors exceed the sample size. We propose random sliced inverse regression (rSIR) by applying SIR to many bootstrap samples, each using a subset of randomly selected candidate predictors. The final rSIR estimate is obtained by aggregating these estimates. A simple variable selection procedure is also proposed using these bootstrap estimates. The performance of the proposed estimates is studied via extensive simulation. Application to a dataset concerning myocardial perfusion diagnosis from cardiac Single Proton Emission Computed Tomography (SPECT) images is presented.  相似文献   

20.
In high-dimensional data analysis, penalized likelihood estimators are shown to provide superior results in both variable selection and parameter estimation. A new algorithm, APPLE, is proposed for calculating the Approximate Path for Penalized Likelihood Estimators. Both convex penalties (such as LASSO) and folded concave penalties (such as MCP) are considered. APPLE efficiently computes the solution path for the penalized likelihood estimator using a hybrid of the modified predictor-corrector method and the coordinate-descent algorithm. APPLE is compared with several well-known packages via simulation and analysis of two gene expression data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号