首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A challenging problem in the analysis of high-dimensional data is variable selection. In this study, we describe a bootstrap based technique for selecting predictors in partial least-squares regression (PLSR) and principle component regression (PCR) in high-dimensional data. Using a bootstrap-based technique for significance tests of the regression coefficients, a subset of the original variables can be selected to be included in the regression, thus obtaining a more parsimonious model with smaller prediction errors. We compare the bootstrap approach with several variable selection approaches (jack-knife and sparse formulation-based methods) on PCR and PLSR in simulation and real data.  相似文献   

2.
3.
The authors consider dimensionality reduction methods used for prediction, such as reduced rank regression, principal component regression and partial least squares. They show how it is possible to obtain intermediate solutions by estimating simultaneously the latent variables for the predictors and for the responses. They obtain a continuum of solutions that goes from reduced rank regression to principal component regression via maximum likelihood and least squares estimation. Different solutions are compared using simulated and real data.  相似文献   

4.
Several biased estimators have been proposed as alternatives to the least squares estimator when multicollinearity is present in the multiple linear regression model. The ridge estimator and the principal components estimator are two techniques that have been proposed for such problems. In this paper the class of fractional principal component estimators is developed for the multiple linear regression model. This class contains many of the biased estimators commonly used to combat multicollinearity. In the fractional principal components framework, two new estimation techniques are introduced. The theoretical performances of the new estimators are evaluated and their small sample properties are compared via simulation with the ridge, generalized ridge and principal components estimators  相似文献   

5.
Partial least squares regression has been widely adopted within some areas as a useful alternative to ordinary least squares regression in the manner of other shrinkage methods such as principal components regression and ridge regression. In this paper we examine the nature of this shrinkage and demonstrate that partial least squares regression exhibits some undesirable properties.  相似文献   

6.
To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.  相似文献   

7.
Summary. Least squares methods are popular for fitting valid variogram models to spatial data. The paper proposes a new least squares method based on spatial subsampling for variogram model fitting. We show that the method proposed is statistically efficient among a class of least squares methods, including the generalized least squares method. Further, it is computationally much simpler than the generalized least squares method. The method produces valid variogram estimators under very mild regularity conditions on the underlying random field and may be applied with different choices of the generic variogram estimator without analytical calculation. An extension of the method proposed to a class of spatial regression models is illustrated with a real data example. Results from a simulation study on finite sample properties of the method are also reported.  相似文献   

8.
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.  相似文献   

9.
In this article, we introduce restricted principal components regression (RPCR) estimator by combining the approaches followed in obtaining the restricted least squares estimator and the principal components regression estimator. The performance of the RPCR estimator with respect to the matrix and the generalized mean square error are examined. We also suggest a testing procedure for linear restrictions in principal components regression by using singly and doubly non-central F distribution.  相似文献   

10.
We develop a new robust stopping criterion for partial least squares regression (PLSR) component construction, characterized by a high level of stability. This new criterion is universal since it is suitable both for PLSR and extensions to generalized linear regression (PLSGLR). The criterion is based on a non-parametric bootstrap technique and must be computed algorithmically. It allows the testing of each successive component at a preset significance level \(\alpha \). In order to assess its performance and robustness with respect to various noise levels, we perform dataset simulations in which there is a preset and known number of components. These simulations are carried out for datasets characterized both by \(n>p\), with n the number of subjects and p the number of covariates, as well as for \(n<p\). We then use t-tests to compare the predictive performance of our approach with other common criteria. The stability property is in particular tested through re-sampling processes on a real allelotyping dataset. An important additional conclusion is that this new criterion gives globally better predictive performances than existing ones in both the PLSR and PLSGLR (logistic and poisson) frameworks.  相似文献   

11.
Abstract

Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online.  相似文献   

12.
ABSTRACT. We generalize the relationship between continuum regression (Stone & Brooks, 1990) and ridge regression, by showing that any optimization principle will yield a regressor proportional to a ridge regressor, provided only that the principle implies maximizing a function of the regressor's sample correlation coefficient and its sample variance. This relationship shows that continuum regression as defined via ridge regression ("least squares ridge regression") is a more generally valid methodology than previously realized, and also opens up for alternative choices of its second and subsequent factors.  相似文献   

13.
The paper describes two regression models—principal components and maximum-likelihood factor analysis—which may be used when the stochastic predictor varibles are highly intereorrelated and/or contain measurement error. The two problems can occur jointly, for example in social-survey data where the true (but unobserved) covariance matrix can be singular. Departure from singularity of the sample dispersion matrix is then due to measurement error. We first consider the more elementary principal components regression model, where it is shown that it can be derived as a special case of (i) canonical correlation, and (ii) restricted least squares. The second part consists of the more general maximum-likelihood factor-analysis regression model, which is derived from the generalized inverse of the product of two singular matrices. Also, it is proved that factor-analysis regression can be considered as an instrumental variables estimator and therefore does not depend on whether factors have been “properly” identified in terms of substantive behaviour. Consequently the additional task of rotating factors to “simple structure” does not arise.  相似文献   

14.
In regression analysis, to overcome the problem of multicollinearity, the r ? k class estimator is proposed as an alternative to the ordinary least squares estimator which is a general estimator including the ordinary ridge regression estimator, the principal components regression estimator and the ordinary least squares estimator. In this article, we derive the necessary and sufficient conditions for the superiority of the r ? k class estimator over each of these estimators under the Mahalanobis loss function by the average loss criterion. Then, we compare these estimators with each other using the same criterion. Also, we suggest to test to verify if these conditions are indeed satisfied. Finally, a numerical example and a Monte Carlo simulation are done to illustrate the theoretical results.  相似文献   

15.
We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>pn>p cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods.  相似文献   

16.
ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride.  相似文献   

17.
We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator  相似文献   

18.
Several approaches have been suggested for fitting linear regression models to censored data. These include Cox's propor­tional hazard models based on quasi-likelihoods. Methods of fitting based on least squares and maximum likelihoods have also been proposed. The methods proposed so far all require special purpose optimization routines. We describe an approach here which requires only a modified standard least squares routine.

We present methods for fitting a linear regression model to censored data by least squares and method of maximum likelihood. In the least squares method, the censored values are replaced by their expectations, and the residual sum of squares is minimized. Several variants are suggested in the ways in which the expect­ation is calculated. A parametric (assuming a normal error model) and two non-parametric approaches are described. We also present a method for solving the maximum likelihood equations in the estimation of the regression parameters in the censored regression situation. It is shown that the solutions can be obtained by a recursive algorithm which needs only a least squares routine for optimization. The suggested procesures gain considerably in computational officiency. The Stanford Heart Transplant data is used to illustrate the various methods.  相似文献   

19.
A class of trimmed linear conditional estimators based on regression quantiles for the linear regression model is introduced. This class serves as a robust analogue of non-robust linear unbiased estimators. Asymptotic analysis then shows that the trimmed least squares estimator based on regression quantiles ( Koenker and Bassett ( 1978 ) ) is the best in this estimator class in terms of asymptotic covariance matrices. The class of trimmed linear conditional estimators contains the Mallows-type bounded influence trimmed means ( see De Jongh et al ( 1988 ) ) and trimmed instrumental variables estimators. A large sample methodology based on trimmed instrumental variables estimator for confidence ellipsoids and hypothesis testing is also provided.  相似文献   

20.
主成分回归方法已得到广泛应用,但该方法是否能减小参数估计的误差,理论上并没有明确的结论。以3个假设模型为例,运用模拟计算的方法对主成分回归方法进行了研究,发现主成分回归估计的误差可能比普通最小二乘估计更小,也可能更大,依赖于实际的模型。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号