首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Selection of relevant predictor variables for building a model is an important problem in the multiple linear regression. Variable selection method based on ordinary least squares estimator fails to select the set of relevant variables for building a model in the presence of outliers and leverage points. In this article, we propose a new robust variable selection criterion for selection of relevant variables in the model and establish its consistency property. Performance of the proposed method is evaluated through simulation study and real data.  相似文献   

2.
Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.  相似文献   

3.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

4.
One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set.  相似文献   

5.
ABSTRACT

Functional linear model is of great practical importance, as exemplified by applications in high-throughput studies such as meteorological and biomedical research. In this paper, we propose a new functional variable selection procedure, called functional variable selection via Gram–Schmidt (FGS) orthogonalization, for a functional linear model with a scalar response and multiple functional predictors. Instead of the regularization methods, FGS takes into account the similarity between the functional predictors in a data-driven way and utilizes the technique of Gram–Schmidt orthogonalization to remove the irrelevant predictors. FGS can successfully discriminate between the relevant and the irrelevant functional predictors to achieve a high true positive ratio without including many irrelevant predictors, and yield explainable models, which offers a new perspective for the variable selection method in the functional linear model. Simulation studies are carried out to evaluate the finite sample performance of the proposed method, and also a weather data set is analysed.  相似文献   

6.
The authors consider the problem of simultaneous transformation and variable selection for linear regression. They propose a fully Bayesian solution to the problem, which allows averaging over all models considered including transformations of the response and predictors. The authors use the Box‐Cox family of transformations to transform the response and each predictor. To deal with the change of scale induced by the transformations, the authors propose to focus on new quantities rather than the estimated regression coefficients. These quantities, referred to as generalized regression coefficients, have a similar interpretation to the usual regression coefficients on the original scale of the data, but do not depend on the transformations. This allows probabilistic statements about the size of the effect associated with each variable, on the original scale of the data. In addition to variable and transformation selection, there is also uncertainty involved in the identification of outliers in regression. Thus, the authors also propose a more robust model to account for such outliers based on a t‐distribution with unknown degrees of freedom. Parameter estimation is carried out using an efficient Markov chain Monte Carlo algorithm, which permits moves around the space of all possible models. Using three real data sets and a simulated study, the authors show that there is considerable uncertainty about variable selection, choice of transformation, and outlier identification, and that there is advantage in dealing with all three simultaneously. The Canadian Journal of Statistics 37: 361–380; 2009 © 2009 Statistical Society of Canada  相似文献   

7.
ABSTRACT

M-estimation is a widely used technique for robust statistical inference. In this paper, we study robust partially functional linear regression model in which a scale response variable is explained by a function-valued variable and a finite number of real-valued variables. For the estimation of the regression parameters, which include the infinite dimensional function as well as the slope parameters for the real-valued variables, we use polynomial splines to approximate the slop parameter. The estimation procedure is easy to implement, and it is resistant to heavy-tailederrors or outliers in the response. The asymptotic properties of the proposed estimators are established. Finally, we assess the finite sample performance of the proposed method by Monte Carlo simulation studies.  相似文献   

8.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

9.
Motivated by an entropy inequality, we propose for the first time a penalized profile likelihood method for simultaneously selecting significant variables and estimating unknown coefficients in multiple linear regression models in this article. The new method is robust to outliers or errors with heavy tails and works well even for error with infinite variance. Our proposed approach outperforms the adaptive lasso in both theory and practice. It is observed from the simulation studies that (i) the new approach possesses higher probability of correctly selecting the exact model than the least absolute deviation lasso and the adaptively penalized composite quantile regression approach and (ii) exact model selection via our proposed approach is robust regardless of the error distribution. An application to a real dataset is also provided.  相似文献   

10.

This paper is motivated by our collaborative research and the aim is to model clinical assessments of upper limb function after stroke using 3D-position and 4D-orientation movement data. We present a new nonlinear mixed-effects scalar-on-function regression model with a Gaussian process prior focusing on the variable selection from a large number of candidates including both scalar and function variables. A novel variable selection algorithm has been developed, namely functional least angle regression. As it is essential for this algorithm, we studied the representation of functional variables with different methods and the correlation between a scalar and a group of mixed scalar and functional variables. We also propose a new stopping rule for practical use. This algorithm is efficient and accurate for both variable selection and parameter estimation even when the number of functional variables is very large and the variables are correlated. And thus the prediction provided by the algorithm is accurate. Our comprehensive simulation study showed that the method is superior to other existing variable selection methods. When the algorithm was applied to the analysis of the movement data, the use of the nonlinear random-effect model and the function variables significantly improved the prediction accuracy for the clinical assessment.

  相似文献   

11.
In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers.  相似文献   

12.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

13.
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset.  相似文献   

14.
It is frequently the case that a response will be related to both a vector of finite length and a function-valued random variable as predictor variables. In this paper, we propose new estimators for the parameters of a partial functional linear model which explores the relationship between a scalar response variable and mixed-type predictors. Asymptotic properties of the proposed estimators are established and finite sample behavior is studied through a small simulation experiment.  相似文献   

15.
In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L2 penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution.  相似文献   

16.
Research on the multiple comparison during the past 60 years or so has focused mainly on the comparison of several population means. Spurrier (J Am Stat Assoc 94:483–488, 1999) and Liu et al. (J Am Stat Assoc 99:395–403, 2004) considered the multiple comparison of several linear regression lines. They assumed that there was no functional relationship between the predictor variables. For the case of the polynomial regression model, the functional relationship between the predictor variables does exist. This lack of a full utilization of the functional relationship between the predictor variables may have some undesirable consequences. In this article we introduce an exact method for the multiple comparison of several polynomial regression models. This method sufficiently takes advantage of the feature of the polynomial regression model, and therefore, it can quickly and accurately compute the critical constant. This proposed method allows various types of comparisons, including pairwise, many-to-one and successive, and it also allows the predictor variable to be either unconstrained or constrained to a finite interval. The examples from the dose-response study are used to illustrate the method. MATLAB programs have been written for easy implementation of this method.  相似文献   

17.
The varying coefficient model (VCM) is an important generalization of the linear regression model and many existing estimation procedures for VCM were built on L 2 loss, which is popular for its mathematical beauty but is not robust to non-normal errors and outliers. In this paper, we address the problem of both robustness and efficiency of estimation and variable selection procedure based on the convex combined loss of L 1 and L 2 instead of only quadratic loss for VCM. By using local linear modeling method, the asymptotic normality of estimation is driven and a useful selection method is proposed for the weight of composite L 1 and L 2. Then the variable selection procedure is given by combining local kernel smoothing with adaptive group LASSO. With appropriate selection of tuning parameters by Bayesian information criterion (BIC) the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the new method is investigated through simulation studies and the analysis of body fat data. Numerical studies show that the new method is better than or at least as well as the least square-based method in terms of both robustness and efficiency for variable selection.  相似文献   

18.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

19.
To assess the quality of the fit in a multiple linear regression, the coefficient of determination or R2 is a very simple tool, yet the most used by practitioners. Indeed, it is reported in most statistical analyzes, and although it is not recommended as a final model selection tool, it provides an indication of the suitability of the chosen explanatory variables in predicting the response. In the classical setting, it is well known that the least-squares fit and coefficient of determination can be arbitrary and/or misleading in the presence of a single outlier. In many applied settings, the assumption of normality of the errors and the absence of outliers are difficult to establish. In these cases, robust procedures for estimation and inference in linear regression are available and provide a suitable alternative.  相似文献   

20.
Ordered multiple categorical (MC) variable has been widely considered and studied as response variable, and few studies have carefully considered it as a predictor in linear regression. When doing this, the existence of some pseudo-categories may result in overfitting, and to detect those pseudo-categories by hypothesis test of all dummy variables might have low specificity. In this paper, we propose a transformation method of dummy variables for such ordered MC predictors, after which a model selection method combined with BIC will be elaborated. Theoretical consistency of our model selection method is established under some common assumptions. Both simulation studies and real data analysis of a medical survey indicate that our method provides good performance and is applicable to a wide range of biomedical research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号