期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of prediction methods for multicollinear data

Tormod Naes Harald Martens 《统计学通讯:模拟与计算》2013,42(3):545-576

In this paper we discuss the partial least squares (PLS) prediction method. The method is compared to the predictor based on principal component regression (PCR). Both theoretical considerations and computations on artificial and real data are presented. 相似文献

2.

Regression Shrinkage Methods and Autoregressive Time Series Prediction

J. B. Copas M. C. Jones 《Australian & New Zealand Journal of Statistics》1987,29(3):264-277

The pros and cons of applying regression shrinkage prediction arguments and methods to autoregressive time series forecasting are discussed. Simulation evidence of the performance of a Stein regression prediction formula suggests that the overall dominance of the shrunken predictor over least squares in regression no longer holds in time series samples of a reasonable length. Rather, shrinkage appears the better of the two, with respect to prediction mean squared error, only for weaker relationships and seems to be inferior to the least squares predictor when the autoregressive relationship is strong. 相似文献

3.

Study of partial least squares and ridge regression methods

Luis Firinguetti Golam Kibria Rodrigo Araya 《统计学通讯:模拟与计算》2017,46(8):6631-6644

This article considers both Partial Least Squares (PLS) and Ridge Regression (RR) methods to combat multicollinearity problem. A simulation study has been conducted to compare their performances with respect to Ordinary Least Squares (OLS). With varying degrees of multicollinearity, it is found that both, PLS and RR, estimators produce significant reductions in the Mean Square Error (MSE) and Prediction Mean Square Error (PMSE) over OLS. However, from the simulation study it is evident that the RR performs better when the error variance is large and the PLS estimator achieves its best results when the model includes more variables. However, the advantage of the ridge regression method over PLS is that it can provide the 95% confidence interval for the regression coefficients while PLS cannot. 相似文献

4.

The geometry of 2-block partial least squares regression

A. Phatak P.M. Reilly A. Penlidis 《统计学通讯:理论与方法》2013,42(6):1517-1553

相似文献

5.

The additive hazards model with high-dimensional regressors

Torben Martinussen Thomas H. Scheike 《Lifetime data analysis》2009,15(3):330-342

This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data. 相似文献

6.

Partial least squares Cox regression for genome-wide data

Nygård S Borgan O Lingjaerde OC Størvold HL 《Lifetime data analysis》2008,14(2):179-195

Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods. 相似文献

7.

The peculiar shrinkage properties of partial least squares regression

Neil A. Butler & Michael C. Denham 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(3):585-593

Partial least squares regression has been widely adopted within some areas as a useful alternative to ordinary least squares regression in the manner of other shrinkage methods such as principal components regression and ridge regression. In this paper we examine the nature of this shrinkage and demonstrate that partial least squares regression exhibits some undesirable properties. 相似文献

8.

Partial Least Squares: A First-order Analysis

Petre Stoica & Torsten Söderström 《Scandinavian Journal of Statistics》1998,25(1):17-24

We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator 相似文献

9.

Regression with outlier shrinkage

Shifeng Xiong V. Roshan Joseph 《Journal of statistical planning and inference》2013

We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>p

n > p

cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods. 相似文献

10.

Krylov Sequences as a Tool for Analysing Iterated Regression Algorithms

ANDERS BJÖRKSTRÖM 《Scandinavian Journal of Statistics》2010,37(1):166-175

Abstract. We use Krylov sequences to analyse a class of regression methods based on successive identification of latent factors. Some results already proved for partial least squares regression (PLSR) are shown to hold for other methods also. We prove that the well-known peculiar pattern of alternating shrinkage and inflation of the principal components is not unique for PLSR. We also show that for any method in the class under study, the coefficient of determination is always at least as high as for principal components regression with the same number of factors. 相似文献

11.

SNP selection for predicting a quantitative trait

S. Subedi R. Deardon F. S. Schenkel 《Journal of applied statistics》2013,40(3):600-613

Molecular markers combined with powerful statistical tools have made it possible to detect and analyze multiple loci on the genome that are responsible for the phenotypic variation in quantitative traits. The objectives of the study presented in this paper are to identify a subset of single nucleotide polymorphism (SNP) markers that are associated with a particular trait and to construct a model that can best predict the value of the trait given the genotypic information of the SNPs using a three-step strategy. In the first step, a genome-wide association test is performed to screen SNPs that are associated with the quantitative trait of interest. SNPs with p-values of less than 5% are then analyzed in the second step. In the second step, a large number of randomly selected models, each consisting of a fixed number of randomly selected SNPs, are analyzed using the least angle regression method. This step will further remove redundant SNPs due to the complicated association among SNPs. A subset of SNPs that are shown to have a significant effect on the response trait more often than by chance are considered for the third step. In the third step, two alternative methods are considered: the least angle shrinkage and selection operation and sparse partial least squares regression. For both methods, the predictive ability of the fitted model is evaluated by an independent test set. The performance of the proposed method is illustrated by the analysis of a real data set on Canadian Holstein cattle. 相似文献

12.

Treatments of non-metric variables in partial least squares and principal component analysis

Jisu Yoon Tatyana Krivobokova 《Journal of applied statistics》2018,45(6):971-987

This paper reviews various treatments of non-metric variables in partial least squares (PLS) and principal component analysis (PCA) algorithms. The performance of different treatments is compared in an extensive simulation study under several typical data generating processes and associated recommendations are made. Moreover, we find that PLS-based methods are to prefer in practice, since, independent of the data generating process, PLS performs either as good as PCA or significantly outperforms it. As an application of PLS and PCA algorithms with non-metric variables we consider construction of a wealth index to predict household expenditures. Consistent with our simulation study, we find that a PLS-based wealth index with dummy coding outperforms PCA-based ones. 相似文献

13.

On shrinkage least squares estimation in a parallelism problem

A. K. Md. Ehsanes Saleh Pranab Kumar Sen 《统计学通讯:理论与方法》2013,42(5):1451-1466

In a multi-sample simple regression model, generally, homogeneity of the regression slopes leads to improved estimation of the intercepts. Analogous to the preliminary test estimators, (smooth) shrinkage least squares estimators of Intercepts based on the James-Stein rule on regression slopes are considered. Relative pictures on the (asymptotic) risk of the classical, preliminary test and the shrinkage least squares estimators are also presented. None of the preliminary test and shrinkage least squares estimators may dominate over the other, though each of them fares well relative to the other estimators. 相似文献

14.

Latent root regression: a biased regression methodology for use with collinear predictor variables

Robert L. Mason 《统计学通讯:理论与方法》2013,42(9):2651-2678

Many different biased regression techniques have been proposed for estimating parameters of a multiple linear regression model when the predictor variables are collinear. One particular alternative, latent root regression analysis, is a technique based on analyzing the latent roots and latent vectors of the correlation matrix of both the response and the predictor variables. It is the purpose of this paper to review the latent root regression estimator and to re-examine some of its properties and applications. It is shown that the latent root estimator is a member of a wider class of estimators for linear models 相似文献

15.

Partial least squares estimator for single-index models

Prasad Naik & Chih-Ling Tsai 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(4):763-771

The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known. 相似文献

16.

Classification trees aided mixed regression model

Oguz Akbilgic 《Journal of applied statistics》2015,42(8):1773-1781

This paper introduces a novel hybrid regression method (MixReg) combining two linear regression methods, ordinary least square (OLS) and least squares ratio (LSR) regression. LSR regression is a method to find the regression coefficients minimizing the sum of squared error rate while OLS minimizes the sum of squared error itself. The goal of this study is to combine two methods in a way that the proposed method superior both OLS and LSR regression methods in terms of R² statistics and relative error rate. Applications of MixReg, on both simulated and real data, show that MixReg method outperforms both OLS and LSR regression. 相似文献

17.

On distribution-weighted partial least squares with diverging number of highly correlated predictors

Li-Ping Zhu Li-Xing Zhu 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(2):525-548

Summary. Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O { n ^1/2/ log ( n )} and o ( n ^1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n ^1/2 and n ^1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n –large p ' problems. 相似文献

18.

Regression models with ordered multiple categorical predictors

Haitao Tian Ching-Yu Cheng Liang Zhang 《Journal of Statistical Computation and Simulation》2018,88(16):3164-3178

Ordered multiple categorical (MC) variable has been widely considered and studied as response variable, and few studies have carefully considered it as a predictor in linear regression. When doing this, the existence of some pseudo-categories may result in overfitting, and to detect those pseudo-categories by hypothesis test of all dummy variables might have low specificity. In this paper, we propose a transformation method of dummy variables for such ordered MC predictors, after which a model selection method combined with BIC will be elaborated. Theoretical consistency of our model selection method is established under some common assumptions. Both simulation studies and real data analysis of a medical survey indicate that our method provides good performance and is applicable to a wide range of biomedical research. 相似文献

19.

A simulation study on SPSS ridge regression and ordinary least squares regression procedures for multicollinearity data

John Zhang Mahmud Ibrahim 《Journal of applied statistics》2005,32(6):571-588

This study compares the SPSS ordinary least squares (OLS) regression and ridge regression procedures in dealing with multicollinearity data. The LS regression method is one of the most frequently applied statistical procedures in application. It is well documented that the LS method is extremely unreliable in parameter estimation while the independent variables are dependent (multicollinearity problem). The Ridge Regression procedure deals with the multicollinearity problem by introducing a small bias in the parameter estimation. The application of Ridge Regression involves the selection of a bias parameter and it is not clear if it works better in applications. This study uses a Monte Carlo method to compare the results of OLS procedure with the Ridge Regression procedure in SPSS. 相似文献

20.

The construction of a partial least-squares biplot

Opeoluwa F. Oyedele Sugnet Lubbe 《Journal of applied statistics》2015,42(11):2449-2460

Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients. 相似文献