首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High-dimensional data arise frequently in modern applications such as biology, chemometrics, economics, neuroscience and other scientific fields. The common features of high-dimensional data are that many of predictors may not be significant, and there exists high correlation among predictors. Generalized linear models, as the generalization of linear models, also suffer from the collinearity problem. In this paper, combining the nonconvex penalty and ridge regression, we propose the weighted elastic-net to deal with the variable selection of generalized linear models on high dimension and give the theoretical properties of the proposed method with a diverging number of parameters. The finite sample behavior of the proposed method is illustrated with simulation studies and a real data example.  相似文献   

2.
In this paper, we study the problem of estimation and variable selection for generalised partially linear single-index models based on quasi-likelihood, extending existing studies on variable selection for partially linear single-index models to binary and count responses. To take into account the unit norm constraint of the index parameter, we use the ‘delete-one-component’ approach. The asymptotic normality of the estimates is demonstrated. Furthermore, the smoothly clipped absolute deviation penalty is added for variable selection of parameters both in the nonparametric part and the parametric part, and the oracle property of the variable selection procedure is shown. Finally, some simulation studies are carried out to illustrate the finite sample performance.  相似文献   

3.
In this paper, we consider the problem of variable selection for partially varying coefficient single-index model, and present a regularized variable selection procedure by combining basis function approximations with smoothly clipped absolute deviation penalty. The proposed procedure simultaneously selects significant variables in the single-index parametric components and the nonparametric coefficient function components. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Finite sample performance of the proposed method is illustrated by a simulation study and real data analysis.  相似文献   

4.
In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.  相似文献   

5.
In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with non zero coefficients with probability converging to one, and that the estimators of non zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an Oracle property. The results are examined by some simulations and a real example.  相似文献   

6.
A new variable selection approach utilizing penalized estimating equations is developed for high-dimensional longitudinal data with dropouts under a missing at random (MAR) mechanism. The proposed method is based on the best linear approximation of efficient scores from the full dataset and does not need to specify a separate model for the missing or imputation process. The coordinate descent algorithm is adopted to implement the proposed method and is computational feasible and stable. The oracle property is established and extensive simulation studies show that the performance of the proposed variable selection method is much better than that of penalized estimating equations dealing with complete data which do not account for the MAR mechanism. In the end, the proposed method is applied to a Lifestyle Education for Activity and Nutrition study and the interaction effect between intervention and time is identified, which is consistent with previous findings.  相似文献   

7.
Generalized linear models (GLMs) are widely studied to deal with complex response variables. For the analysis of categorical dependent variables with more than two response categories, multivariate GLMs are presented to build the relationship between this polytomous response and a set of regressors. Traditional variable selection approaches have been proposed for the multivariate GLM with a canonical link function when the number of parameters is fixed in the literature. However, in many model selection problems, the number of parameters may be large and grow with the sample size. In this paper, we present a new selection criterion to the model with a diverging number of parameters. Under suitable conditions, the criterion is shown to be model selection consistent. A simulation study and a real data analysis are conducted to support theoretical findings.  相似文献   

8.
Abstract

In this article, we focus on the variable selection for semiparametric varying coefficient partially linear model with response missing at random. Variable selection is proposed based on modal regression, where the non parametric functions are approximated by B-spline basis. The proposed procedure uses SCAD penalty to realize variable selection of parametric and nonparametric components simultaneously. Furthermore, we establish the consistency, the sparse property and asymptotic normality of the resulting estimators. The penalty estimation parameters value of the proposed method is calculated by EM algorithm. Simulation studies are carried out to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

9.
Kaifeng Zhao 《Statistics》2016,50(6):1276-1289
This paper considers variable selection in additive quantile regression based on group smoothly clipped absolute deviation (gSCAD) penalty. Although shrinkage variable selection in additive models with least-squares loss has been well studied, quantile regression is sufficiently different from mean regression to deserve a separate treatment. It is shown that the gSCAD estimator can correctly identify the significant components and at the same time maintain the usual convergence rates in estimation. Simulation studies are used to illustrate our method.  相似文献   

10.
G. Aneiros  F. Ferraty  P. Vieu 《Statistics》2015,49(6):1322-1347
The problem of variable selection is considered in high-dimensional partial linear regression under some model allowing for possibly functional variable. The procedure studied is that of nonconcave-penalized least squares. It is shown the existence of a √n/sn-consistent estimator for the vector of pn linear parameters in the model, even when pn tends to ∞ as the sample size n increases (sn denotes the number of influential variables). An oracle property is also obtained for the variable selection method, and the nonparametric rate of convergence is stated for the estimator of the nonlinear functional component of the model. Finally, a simulation study illustrates the finite sample size performance of our procedure.  相似文献   

11.
In this paper, we introduce a partially linear single-index additive hazards model with current status data. Both the unknown link function of the single-index term and the cumulative baseline hazard function are approximated by B-splines under a monotonicity constraint on the latter. The sieve method is applied to estimate the nonparametric and parametric components simultaneously. We show that, when the nonparametric link function is an exact B-spline, the resultant estimator of regression parameter vector is asymptotically normal and achieves the semiparametric information bound and the rate of convergence of the estimator for the cumulative baseline hazard function is optimal. Simulation studies are presented to examine the finite sample performance of the proposed estimation method. For illustration, we apply the method to a clinical dataset with current status outcome.  相似文献   

12.
In this paper, we consider the weighted composite quantile regression for linear model with left-truncated data. The adaptive penalized procedure for variable selection is proposed. The asymptotic normality and oracle property of the resulting estimators are also established. Simulation studies are conducted to illustrate the finite sample performance of the proposed methods.  相似文献   

13.
Many methods have been developed in the literature for regression analysis of current status data with noninformative censoring and also some approaches have been proposed for semiparametric regression analysis of current status data with informative censoring. However, the existing approaches for the latter situation are mainly on specific models such as the proportional hazards model and the additive hazard model. Corresponding to this, in this paper, we consider a general class of semiparametric linear transformation models and develop a sieve maximum likelihood estimation approach for the inference. In the method, the copula model is employed to describe the informative censoring or relationship between the failure time of interest and the censoring time, and Bernstein polynomials are used to approximate the nonparametric functions involved. The asymptotic consistency and normality of the proposed estimators are established, and an extensive simulation study is conducted and indicates that the proposed approach works well for practical situations. In addition, an illustrative example is provided.  相似文献   

14.
Two-phase stratified sampling has been extensively used in large epidemiologic studies as a way of reducing costs associated with assembling covariate histories and enlarging relative sample sizes of the most informative subgroups. In this article, we investigate case-cohort sampled current status data under the additive risk model assumption. We describe a class of estimating equations, each depending on a different prevalence ratio estimate. Asymptotic properties of the proposed estimators and inference based on the “m out of n” nonparametric bootstrap are investigated. A small simulation study is employed to evaluate the finite sample performance and relative efficiency of the proposed estimators.  相似文献   

15.
Current status data arise when the death of every subject in a study cannot be determined precisely, but is known only to have occurred before or after a random monitoring time. The authors discuss the analysis of such data under semiparametric linear transformation models for which they propose a general inference procedure based on estimating functions. They determine the properties of the estimates they propose for the regression parameters of the model and illustrate their technique using tumorigenicity data.  相似文献   

16.
In this paper, we focus on the variable selection for the semiparametric regression model with longitudinal data when some covariates are measured with errors. A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations. With appropriate selection of the tuning parameters, we establish the consistency and asymptotic normality of the resulting estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. We further illustrate the proposed procedure with an application.  相似文献   

17.
This paper is concerned with selection of explanatory variables in generalized linear models (GLM). The class of GLM's is quite large and contains e.g. the ordinary linear regression, the binary logistic regression, the probit model and Poisson regression with linear or log-linear parameter structure. We show that, through an approximation of the log likelihood and a certain data transformation, the variable selection problem in a GLM can be converted into variable selection in an ordinary (unweighted) linear regression model. As a consequence no specific computer software for variable selection in GLM's is needed. Instead, some suitable variable selection program for linear regression can be used. We also present a simulation study which shows that the log likelihood approximation is very good in many practical situations. Finally, we mention briefly possible extensions to regression models outside the class of GLM's.  相似文献   

18.
We propose a new algorithm for simultaneous variable selection and parameter estimation for the single-index quantile regression (SIQR) model . The proposed algorithm, which is non iterative , consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and , using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method consistently estimates the relevant variables , and the estimated parametric component derived in Step 2 satisfies the oracle property.  相似文献   

19.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

20.
Spatial regression models are important tools for many scientific disciplines including economics, business, and social science. In this article, we investigate postmodel selection estimators that apply least squares estimation to the model selected by penalized estimation in high-dimensional regression models with spatial autoregressive errors. We show that by separating the model selection and estimation process, the postmodel selection estimator performs at least as well as the simultaneous variable selection and estimation method in terms of the rate of convergence. Moreover, under perfect model selection, the 2 rate of convergence is the oracle rate of s/n, compared with the convergence rate of ◂√▸slogp/n in the general case. Here, n is the sample size and p, s are the model dimension and number of significant covariates, respectively. We further provide the convergence rate of the estimation error in the form of sup norm, and ideally the rate can reach as fast as ◂√▸logs/n.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号