首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

2.
As a useful supplement to mean regression, quantile regression is a completely distribution-free approach and is more robust to heavy-tailed random errors. In this paper, a variable selection procedure for quantile varying coefficient models is proposed by combining local polynomial smoothing with adaptive group LASSO. With an appropriate selection of tuning parameters by the BIC criterion, the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the newly proposed method is investigated through simulation studies and the analysis of Boston house price data. Numerical studies confirm that the newly proposed procedure (QKLASSO) has both robustness and efficiency for varying coefficient models irrespective of error distribution, which is a good alternative and necessary supplement to the KLASSO method.  相似文献   

3.
Graphs and networks are common ways of depicting information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper, we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. We study a graph-constrained regularization procedure and its theoretical properties for regression analysis to take into account the neighborhood information of the variables measured on a graph, where a smoothness penalty on the coefficients is defined as a quadratic form of the Laplacian matrix associated with the graph. We establish estimation and model selection consistency results and provide estimation bounds for both fixed and diverging numbers of parameters in regression models. We demonstrate by simulations and a real dataset that the proposed procedure can lead to better variable selection and prediction than existing methods that ignore the graph information associated with the covariates.  相似文献   

4.
A regression model with skew-normal errors provides a useful extension for ordinary normal regression models when the data set under consideration involves asymmetric outcomes. Variable selection is an important issue in all regression analyses, and in this paper, we investigate the simultaneously variable selection in joint location and scale models of the skew-normal distribution. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies and a real example are used to illustrate the proposed methodologies.  相似文献   

5.
Due to computational challenges and non-availability of conjugate prior distributions, Bayesian variable selection in quantile regression models is often a difficult task. In this paper, we address these two issues for quantile regression models. In particular, we develop an informative stochastic search variable selection (ISSVS) for quantile regression models that introduces an informative prior distribution. We adopt prior structures which incorporate historical data into the current data by quantifying them with a suitable prior distribution on the model parameters. This allows ISSVS to search more efficiently in the model space and choose the more likely models. In addition, a Gibbs sampler is derived to facilitate the computation of the posterior probabilities. A major advantage of ISSVS is that it avoids instability in the posterior estimates for the Gibbs sampler as well as convergence problems that may arise from choosing vague priors. Finally, the proposed methods are illustrated with both simulation and real data.  相似文献   

6.
In this article we present a robust and efficient variable selection procedure by using modal regression for varying-coefficient models with longitudinal data. The new method is proposed based on basis function approximations and a group version of the adaptive LASSO penalty, which can select significant variables and estimate the non-zero smooth coefficient functions simultaneously. Under suitable conditions, we establish the consistency in variable selection and the oracle property in estimation. A simulation study and two real data examples are undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

7.
One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set.  相似文献   

8.
9.
ABSTRACT

In this paper, we study a novelly robust variable selection and parametric component identification simultaneously in varying coefficient models. The proposed estimator is based on spline approximation and two smoothly clipped absolute deviation (SCAD) penalties through rank regression, which is robust with respect to heavy-tailed errors or outliers in the response. Furthermore, when the tuning parameter is chosen by modified BIC criterion, we show that the proposed procedure is consistent both in variable selection and the separation of varying and constant coefficients. In addition, the estimators of varying coefficients possess the optimal convergence rate under some assumptions, and the estimators of constant coefficients have the same asymptotic distribution as their counterparts obtained when the true model is known. Simulation studies and a real data example are undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

10.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

11.
Based on B-spline basis functions and smoothly clipped absolute deviation (SCAD) penalty, we present a new estimation and variable selection procedure based on modal regression for partially linear additive models. The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions and performs no worse than the least-square-based estimation for normal error case. The main difference is that the standard quadratic loss is replaced by a kernel function depending on a bandwidth that can be automatically selected based on the observed data. With appropriate selection of the regularization parameters, the new method possesses the consistency in variable selection and oracle property in estimation. Finally, both simulation study and real data analysis are performed to examine the performance of our approach.  相似文献   

12.
In this paper, we focus on the variable selection for the semiparametric regression model with longitudinal data when some covariates are measured with errors. A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations. With appropriate selection of the tuning parameters, we establish the consistency and asymptotic normality of the resulting estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. We further illustrate the proposed procedure with an application.  相似文献   

13.
In many areas of medical research, especially in studies that involve paired organs, a bivariate ordered categorical response should be analyzed. Using a bivariate continuous distribution as the latent variable is an interesting strategy for analyzing these data sets. In this context, the bivariate standard normal distribution, which leads to the bivariate cumulative probit regression model, is the most common choice. In this paper, we introduce another latent variable regression model for modeling bivariate ordered categorical responses. This model may be an appropriate alternative for the bivariate cumulative probit regression model, when postulating a symmetric form for marginal or joint distribution of response data does not appear to be a valid assumption. We also develop the necessary numerical procedure to obtain the maximum likelihood estimates of the model parameters. To illustrate the proposed model, we analyze data from an epidemiologic study to identify some of the most important risk indicators of periodontal disease among students 15-19 years in Tehran, Iran.  相似文献   

14.
In this article, we develop a robust variable selection procedure jointly for fixed and random effects in linear mixed models for longitudinal data. We propose a penalized robust estimator for both the regression coefficients and the variance of random effects based on a re-parametrization of the linear mixed models. Under some regularity conditions, we show the oracle properties of the proposed robust variable selection method. Simulation study shows the robustness of the proposed method against outliers. In the end, the proposed methods is illustrated in the analysis of a real data set.  相似文献   

15.
We consider a linear regression model where there are group structures in covariates. The group LASSO has been proposed for group variable selections. Many nonconvex penalties such as smoothly clipped absolute deviation and minimax concave penalty were extended to group variable selection problems. The group coordinate descent (GCD) algorithm is used popularly for fitting these models. However, the GCD algorithms are hard to be applied to nonconvex group penalties due to computational complexity unless the design matrix is orthogonal. In this paper, we propose an efficient optimization algorithm for nonconvex group penalties by combining the concave convex procedure and the group LASSO algorithm. We also extend the proposed algorithm for generalized linear models. We evaluate numerical efficiency of the proposed algorithm compared to existing GCD algorithms through simulated data and real data sets.  相似文献   

16.
Variable selection is an important task in regression analysis. Performance of the statistical model highly depends on the determination of the subset of predictors. There are several methods to select most relevant variables to construct a good model. However in practice, the dependent variable may have positive continuous values and not normally distributed. In such situations, gamma distribution is more suitable than normal for building a regression model. This paper introduces an heuristic approach to perform variable selection using artificial bee colony optimization for gamma regression models. We evaluated the proposed method against with classical selection methods such as backward and stepwise. Both simulation studies and real data set examples proved the accuracy of our selection procedure.  相似文献   

17.
We consider the problem of variable selection in high-dimensional partially linear models with longitudinal data. A variable selection procedure is proposed based on the smooth-threshold generalized estimating equation (SGEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE. We establish the asymptotic properties in a high-dimensional framework where the number of covariates pn increases as the number of clusters n increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure.  相似文献   

18.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

19.
Variable selection is an important issue in all regression analysis, and in this article, we investigate the simultaneous variable selection in joint location and scale models of the skew-t-normal distribution when the dataset under consideration involves heavy tail and asymmetric outcomes. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. These estimators are compared by simulation studies.  相似文献   

20.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号