首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
In this paper we discuss the partial least squares (PLS) prediction method. The method is compared to the predictor based on principal component regression (PCR). Both theoretical considerations and computations on artificial and real data are presented.  相似文献   

2.
The pros and cons of applying regression shrinkage prediction arguments and methods to autoregressive time series forecasting are discussed. Simulation evidence of the performance of a Stein regression prediction formula suggests that the overall dominance of the shrunken predictor over least squares in regression no longer holds in time series samples of a reasonable length. Rather, shrinkage appears the better of the two, with respect to prediction mean squared error, only for weaker relationships and seems to be inferior to the least squares predictor when the autoregressive relationship is strong.  相似文献   

3.
This article considers both Partial Least Squares (PLS) and Ridge Regression (RR) methods to combat multicollinearity problem. A simulation study has been conducted to compare their performances with respect to Ordinary Least Squares (OLS). With varying degrees of multicollinearity, it is found that both, PLS and RR, estimators produce significant reductions in the Mean Square Error (MSE) and Prediction Mean Square Error (PMSE) over OLS. However, from the simulation study it is evident that the RR performs better when the error variance is large and the PLS estimator achieves its best results when the model includes more variables. However, the advantage of the ridge regression method over PLS is that it can provide the 95% confidence interval for the regression coefficients while PLS cannot.  相似文献   

4.
5.
This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data.  相似文献   

6.
Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.  相似文献   

7.
Partial least squares regression has been widely adopted within some areas as a useful alternative to ordinary least squares regression in the manner of other shrinkage methods such as principal components regression and ridge regression. In this paper we examine the nature of this shrinkage and demonstrate that partial least squares regression exhibits some undesirable properties.  相似文献   

8.
We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator  相似文献   

9.
We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>pn>p cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods.  相似文献   

10.
Abstract.  We use Krylov sequences to analyse a class of regression methods based on successive identification of latent factors. Some results already proved for partial least squares regression (PLSR) are shown to hold for other methods also. We prove that the well-known peculiar pattern of alternating shrinkage and inflation of the principal components is not unique for PLSR. We also show that for any method in the class under study, the coefficient of determination is always at least as high as for principal components regression with the same number of factors.  相似文献   

11.
Molecular markers combined with powerful statistical tools have made it possible to detect and analyze multiple loci on the genome that are responsible for the phenotypic variation in quantitative traits. The objectives of the study presented in this paper are to identify a subset of single nucleotide polymorphism (SNP) markers that are associated with a particular trait and to construct a model that can best predict the value of the trait given the genotypic information of the SNPs using a three-step strategy. In the first step, a genome-wide association test is performed to screen SNPs that are associated with the quantitative trait of interest. SNPs with p-values of less than 5% are then analyzed in the second step. In the second step, a large number of randomly selected models, each consisting of a fixed number of randomly selected SNPs, are analyzed using the least angle regression method. This step will further remove redundant SNPs due to the complicated association among SNPs. A subset of SNPs that are shown to have a significant effect on the response trait more often than by chance are considered for the third step. In the third step, two alternative methods are considered: the least angle shrinkage and selection operation and sparse partial least squares regression. For both methods, the predictive ability of the fitted model is evaluated by an independent test set. The performance of the proposed method is illustrated by the analysis of a real data set on Canadian Holstein cattle.  相似文献   

12.
This paper reviews various treatments of non-metric variables in partial least squares (PLS) and principal component analysis (PCA) algorithms. The performance of different treatments is compared in an extensive simulation study under several typical data generating processes and associated recommendations are made. Moreover, we find that PLS-based methods are to prefer in practice, since, independent of the data generating process, PLS performs either as good as PCA or significantly outperforms it. As an application of PLS and PCA algorithms with non-metric variables we consider construction of a wealth index to predict household expenditures. Consistent with our simulation study, we find that a PLS-based wealth index with dummy coding outperforms PCA-based ones.  相似文献   

13.
In a multi-sample simple regression model, generally, homogeneity of the regression slopes leads to improved estimation of the intercepts. Analogous to the preliminary test estimators, (smooth) shrinkage least squares estimators of Intercepts based on the James-Stein rule on regression slopes are considered. Relative pictures on the (asymptotic) risk of the classical, preliminary test and the shrinkage least squares estimators are also presented. None of the preliminary test and shrinkage least squares estimators may dominate over the other, though each of them fares well relative to the other estimators.  相似文献   

14.
Many different biased regression techniques have been proposed for estimating parameters of a multiple linear regression model when the predictor variables are collinear. One particular alternative, latent root regression analysis, is a technique based on analyzing the latent roots and latent vectors of the correlation matrix of both the response and the predictor variables. It is the purpose of this paper to review the latent root regression estimator and to re-examine some of its properties and applications. It is shown that the latent root estimator is a member of a wider class of estimators for linear models  相似文献   

15.
The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known.  相似文献   

16.
This paper introduces a novel hybrid regression method (MixReg) combining two linear regression methods, ordinary least square (OLS) and least squares ratio (LSR) regression. LSR regression is a method to find the regression coefficients minimizing the sum of squared error rate while OLS minimizes the sum of squared error itself. The goal of this study is to combine two methods in a way that the proposed method superior both OLS and LSR regression methods in terms of R2 statistics and relative error rate. Applications of MixReg, on both simulated and real data, show that MixReg method outperforms both OLS and LSR regression.  相似文献   

17.
Summary.  Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O { n 1/2/ log ( n )} and o ( n 1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n 1/2 and n 1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n –large p ' problems.  相似文献   

18.
Ordered multiple categorical (MC) variable has been widely considered and studied as response variable, and few studies have carefully considered it as a predictor in linear regression. When doing this, the existence of some pseudo-categories may result in overfitting, and to detect those pseudo-categories by hypothesis test of all dummy variables might have low specificity. In this paper, we propose a transformation method of dummy variables for such ordered MC predictors, after which a model selection method combined with BIC will be elaborated. Theoretical consistency of our model selection method is established under some common assumptions. Both simulation studies and real data analysis of a medical survey indicate that our method provides good performance and is applicable to a wide range of biomedical research.  相似文献   

19.
This study compares the SPSS ordinary least squares (OLS) regression and ridge regression procedures in dealing with multicollinearity data. The LS regression method is one of the most frequently applied statistical procedures in application. It is well documented that the LS method is extremely unreliable in parameter estimation while the independent variables are dependent (multicollinearity problem). The Ridge Regression procedure deals with the multicollinearity problem by introducing a small bias in the parameter estimation. The application of Ridge Regression involves the selection of a bias parameter and it is not clear if it works better in applications. This study uses a Monte Carlo method to compare the results of OLS procedure with the Ridge Regression procedure in SPSS.  相似文献   

20.
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号