首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.  相似文献   

2.
Logistic regression using conditional maximum likelihood estimation has recently gained widespread use. Many of the applications of logistic regression have been in situations in which the independent variables are collinear. It is shown that collinearity among the independent variables seriously effects the conditional maximum likelihood estimator in that the variance of this estimator is inflated in much the same way that collinearity inflates the variance of the least squares estimator in multiple regression. Drawing on the similarities between multiple and logistic regression several alternative estimators, which reduce the effect of the collinearity and are easy to obtain in practice, are suggested and compared in a simulation study.  相似文献   

3.
The usual approach for diagnosing collinearity proceeds by centering and standardizing the regressors. The sample correlation matrix of the predictors is then the basic tool for describing approximate linear combinations that may distort the conclusions of a standard least-square analysis. However, as indicated by several authors, centering may eventually fail to detect the sources of ill-conditioning. In spite of this earlier claim, there does not seem to be in the literature a fully clear explanation of the reasons for this bad potential behavior of the traditional strategy for analyzing collinearity. This note studies this issue in some detail. Results derived are motivated by the analysis of a well-known real dataset. Practical conclusions are illustrated with several examples.  相似文献   

4.
通常所说的Granger因果关系检验,实际上是对线性因果关系的检验,无法检验非线性因果关系。Peguin和Terasvirta(1999)进行了基于泰勒展式的一般性扩展,应用于非线性因果关系检验,并采用提取主成分的方法解决其中的多重共线性问题。但是,提取主成分对解决多重共线性的效果并不太好。Lasso回归是目前处理多重共线性的主要方法之一,相对于其他方法,更容易产生稀疏解,在参数估计的同时实现变量选择,因而可以用来解决检验中的多重共线性问题,以提高检验的效率。对检验程序的模拟结果表明,基于Lasso回归的检验取得较好的效果。  相似文献   

5.
In at least one important application of stochastic linear programming (Lavaca-Tres Palacios Estuary:A Study of the Influence of Freshwater Inflows, 1980)constraint parameters are simultaneously estimated using multiple regression with historic data for the values of the decision variables and the right hand side of the constraint function. In this circumstance, the question immediately arises "How stable is the linear programming (LP) solution with regard to regression issues such as sample size, magnitude of the error variance, centroids of the decision variables, apd collinearity?" This paper reports a simulation designed to assess the stability of the LP solution and to compare the effectiveness of ridge as an alternative to ordinary least squares (OLS) regression. For the given scenario, the LP solution is consistently "biased." The amount of bias is exacerbated by small samples, large error variances, and collinearity among observations of the decision variables. The best regression criterion is a function not only of collinearity, but also of the magnitude of the error variance and the sum of the means of the decision variables relative to the right hand side of the stochastic constraint

In the application that motivated this research, the LP solutions were recommended fresh water inflows from Lake Texana into the estuaries of the Gulf of Mexico. The stochastic constraint estimates commercial fish harvest as a function of seasonal fresh water inflow. The historic data set used to estimate parameters of the constraint comprised rainfall data and fish harvest data prior to the construction of the Lake Texana dam, of necessity a small sample with collinear seasonal rainfall. It is not the authors' intent to solve this application, but rather to investigate through a simpler simulated systemwhether or not regression estimates in similar circumstances might introduce a systematic and predictable bias. The answer to this latter question is a qualified Yes!.  相似文献   

6.
The variance of the Maximum Likelihood Estimator (MLE) of the slope parameter in a logistic regression model becomes large as the degree of collinearity among the explanatory variables increases. In a Monte Carlo study, we observed that a ridge type estimator is at least as good as, and often much better than, the MLE in terms of Total and Prediction Mean Squared Error criteria. Using a set of medical data it is illustrated that the ridge trace of the estimator considered here is a useful diagnostic tool in logistic regression analysis.  相似文献   

7.
We propose a new collinearity diagnostic tool for generalized linear models. The new diagnostic tool is termed the weighted variance inflation factor (WVIF) behaving exactly the same as the traditional variance inflation factor in the context of regression diagnostic, given data matrix normalized. Compared to the use of condition number (CN), WVIF shows more reliable information on how severe the situation is, when data collinearity does exist. An alternative estimator, a by-product of the new diagnostic, outperforms the ridge estimator in the presence of data collinearity in both aspects of WVIF and CN. Evidences are given through analyzing various real-world numerical examples.  相似文献   

8.
ABSTRACT

Ridge penalized least-squares estimators has been suggested as an alternative to the minimum penalized sum of squares estimates in the presence of collinearity among the explanatory variables in semiparametric regression models (SPRMs). This paper studies the local influence of minor perturbations on the ridge estimates in the SPRM. The diagnostics under the perturbation of ridge penalized sum of squares, response variable, explanatory variables and ridge parameter are considered. Some local influence diagnostics are given. A Monte Carlo simulation study and a real example are used to illustrate the proposed perturbations.  相似文献   

9.
This paper presents the results of a Monte Carlo study of OLS and GLS based adaptive ridge estimators for regression problems in which the independent variables are collinear and the errors are autocorrelated. It studies the effects of degree of collinearity, magnitude of error variance, orientation of the parameter vector and serial correlation of the independent variables on the mean squared error performance of these estimators. Results suggest that such estimators produce greatly improved performance in favorable portions of the parameter space. The GLS based methods are best when the independent variables are also serially correlated.  相似文献   

10.
The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known.  相似文献   

11.
This paper considers the analysis of time to event data in the presence of collinearity between covariates. In linear and logistic regression models, the ridge regression estimator has been applied as an alternative to the maximum likelihood estimator in the presence of collinearity. The advantage of the ridge regression estimator over the usual maximum likelihood estimator is that the former often has a smaller total mean square error and is thus more precise. In this paper, we generalized this approach for addressing collinearity to the Cox proportional hazards model. Simulation studies were conducted to evaluate the performance of the ridge regression estimator. Our approach was motivated by an occupational radiation study conducted at Oak Ridge National Laboratory to evaluate health risks associated with occupational radiation exposure in which the exposure tends to be correlated with possible confounders such as years of exposure and attained age. We applied the proposed methods to this study to evaluate the association of radiation exposure with all-cause mortality.  相似文献   

12.
A regression simulation study investigates the behaviour of ICOMP, AIC, and BIC under various collinearity-, sample size-, and residual variance-levels. When the variation in the design matrix is large, as the collinearity levels in the design matrix increased, the agreement percentages for all of the information criteria decreased monotonically and that ICOMP agreed with the Kullback Leibler model more often. As the residual variance increases, the agreement percentages of all of the information criteria decreases. However, as the sample size increased the agreement percentages of all information criteria increased. When the variation in the design matrix is low and the collinearity is low, as the residual variance increases, the agreement percentages for all of the information criteria decreases monotonically such that ICOMP agreed more often with Kullback Leibler model than both AIC and BIC.  相似文献   

13.
Presence of collinearity among the explanatory variables results in larger standard errors of parameters estimated. When multicollinearity is present among the explanatory variables, the ordinary least-square (OLS) estimators tend to be unstable due to larger variance of the estimators of the regression coefficients. As alternatives to OLS estimators few ridge estimators are available in the literature. This article presents some of the popular ridge estimators and attempts to provide (i) a generalized class of ridge estimators and (ii) a modified ridge estimator. The performance of the proposed estimators is investigated with the help of Monte Carlo simulation technique. Simulation results indicate that the suggested estimators perform better than the ordinary least-square (OLS) estimators and other estimators considered in this article.  相似文献   

14.
The use of heteroscedasticity-consistent covariance matrix (HCCM) estimators is very common in practice to draw correct inference for the coefficients of a linear regression model with heteroscedastic errors. However, in addition to the problem of heteroscedasticity, linear regression models may also be plagued with some considerable degree of collinearity among the regressors when two or more regressors are considered. This situation causes many adverse effects on the least squares measures and alternatively, the ordinary ridge regression method is used as a common practice. But in the available literature, the problems of multicollinearity and heteroscedasticity have not been discussed as a combined issue especially, for the inference of the regression coefficients. The present article addresses the inference about the regression coefficients taking both the issues of multicollinearity and heteroscedasticity into account and suggests the use of HCCM estimators for the ridge regression. This article proposes t- and F-tests, based on these HCCM estimators, that perform adequately well in the numerical evaluation of the Monte Carlo simulations.  相似文献   

15.
Ridge regression is the alternative method to ordinary least squares, which is mostly applied when a multiple linear regression model presents a worrying degree of collinearity. A relevant topic in ridge regression is the selection of the ridge parameter, and different proposals have been presented in the scientific literature. Since the ridge estimator is biased, its estimation is normally based on the calculation of the mean square error (MSE) without considering (to the best of our knowledge) whether the proposed value for the ridge parameter really mitigates the collinearity. With this goal and different simulations, this paper proposes to estimate the ridge parameter from the determinant of the matrix of correlation of the data, which verifies that the variance inflation factor (VIF) is lower than the traditionally established threshold. The possible relation between the VIF and the determinant of the matrix of correlation is also analysed. Finally, the contribution is illustrated with three real examples.  相似文献   

16.
Sliced Inverse Regression (SIR) is an effective method for dimension reduction in high-dimensional regression problems. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. Our approach is based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. We propose to introduce a Gaussian prior distribution on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method. A comparison on simulated data as well as an application to the estimation of Mars surface physical properties from hyperspectral images are provided.  相似文献   

17.
Systematic and appropriate statistical analysis is needed to examine the relative performance of anthropometrical indices, viz. body mass index (BMI), waist circumference (WC), waist hip ratio (WHR) and waist stature ratio (WSR) for predicting type 2 diabetes. Using information on socio-demographic, anthropometric and biochemical variables from 2148 males, we examined collinearity and non-linearity among the predictors before studying the association between anthropometric indices and type 2 diabetes. The variable involving in collinearity was removed from further analysis, and the relative importance of BMI, WC and WHR was examined by logistic regression analysis. To avoid non-interpretable odds ratios (ORs), cut point theory is used. Optimal cut points are derived and tested for significance. Multivariable fractional polynomial (MFP) algorithm is applied to reconcile non-linearity. As expected, WSR and WC were collinear with WHR and BMI. Since WSR was jointly as well as independently collinear, it was dropped from further analysis. The OR for WHR could not be interpreted meaningfully. Cut point theory was adopted. Deciles emerged as the optimal cut point. MFP recognized non-linearity effects on the outcome. Multicollinearity among the anthropometric indices was examined. Optimal cut points were identified and used to study the relative ORs. On the basis of the results of analysis, MFP is recommended to accommodate non-linearity among the predictors. WHR is relatively more important and significant than WC and BMI.  相似文献   

18.
In this article, we study stepwise AIC method for variable selection comparing with other stepwise method for variable selection, such as, Partial F, Partial Correlation, and Semi-Partial Correlation in linear regression modeling. Then we show mathematically that the stepwise AIC method and other stepwise methods lead to the same method as Partial F. Hence, there are more reasons to use the stepwise AIC method than the other stepwise methods for variable selection, since the stepwise AIC method is a model selection method that can be easily managed and can be widely extended to more generalized models and applied to non normally distributed data. We also treat problems that always appear in applications, that are validation of selected variables and problem of collinearity.  相似文献   

19.
In this paper we evaluate the stability of the inverse of a correlation matrix by studying the derivatives of each of its entries with respect to each entry of the correlation matrix. From them we deduce the derivatives of the squared length of the inverse matrix, the variance inflation factors (VIF), and the regression coefficients. To illustrate the procedure, we use a correlation matrix that has already been analyzed by Hoerl and Kennard (1970, Technometrics 12, 69–82), and, by looking at the derivatives of the squared length of the regression vector, we show that the addition of a constant to some of the diagonal entries of the matrix is sufficient for obtaining satisfying estimates of the regression coefficients. This ‘partial ridge regression’ is carried out on the previous matrix and modifies only the coefficients which are perturbed by the collinearity.  相似文献   

20.
In this study, we investigate linear regression having both heteroskedasticity and collinearity problems. We discuss the properties related to the perturbation method. Important observations are summarized as theorems. We then prove the main result that states the heteroskedasticity-robust variances can be improved and that the resulting bias is minimized by using the matrix perturbation method. We analyze a practical example for validation of the method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号