In this paper we discuss the partial least squares (PLS) prediction method. The method is compared to the predictor based on principal component regression (PCR). Both theoretical considerations and computations on artificial and real data are presented.  相似文献   

Abstract.  We use Krylov sequences to analyse a class of regression methods based on successive identification of latent factors. Some results already proved for partial least squares regression (PLSR) are shown to hold for other methods also. We prove that the well-known peculiar pattern of alternating shrinkage and inflation of the principal components is not unique for PLSR. We also show that for any method in the class under study, the coefficient of determination is always at least as high as for principal components regression with the same number of factors.  相似文献   

The logratio methodology is not applicable when rounded zeros occur in compositional data. There are many methods to deal with rounded zeros. However, some methods are not suitable for analyzing data sets with high dimensionality. Recently, related methods have been developed, but they cannot balance the calculation time and accuracy. For further improvement, we propose a method based on regression imputation with Q-mode clustering. This method forms the groups of parts and builds partial least squares regression with these groups using centered logratio coordinates. We also prove that using centered logratio coordinates or isometric logratio coordinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros. Simulation study and real example are conducted to analyze the performance of the proposed method. The results show that the proposed method can reduce the calculation time in higher dimensions and improve the quality of results.  相似文献   

ABSTRACT. We generalize the relationship between continuum regression (Stone & Brooks, 1990) and ridge regression, by showing that any optimization principle will yield a regressor proportional to a ridge regressor, provided only that the principle implies maximizing a function of the regressor's sample correlation coefficient and its sample variance. This relationship shows that continuum regression as defined via ridge regression (least squares ridge regression) is a more generally valid methodology than previously realized, and also opens up for alternative choices of its second and subsequent factors.  相似文献   


This paper deals with the problem of estimating the regression of a surrogated scalar response variable given a functional random one. We construct an estimator of the regression operator by using, in addition to the available (true) response data, a surrogate data. We then establish some asymptotic properties of the constructed estimator in terms of the almost-complete and the quadratic mean convergences. Notice that the obtained results generalize a part of the results obtained in the finite dimensional framework. Finally, an illustration on the applicability of our results on both simulated data and a real dataset was realized. We have thus shown the superiority of our estimator on classical estimators when we are lacking complete data.  相似文献   

We look at prediction in regression models under squared loss for the random x case with many explanatory variables. Model reduction is done by conditioning upon only a small number of linear combinations of the original variables. The corresponding reduced model will then essentially be the population model for the chemometricians' partial least squares algorithm. Estimation of the selection matrix under this model is briefly discussed, and analoguous results for the case with multivariate response are formulated. Finally, it is shown that an assumption of multinormality may be weakened to assuming elliptically symmetric distribution, and that some of the results are valid without any distributional assumption at all.  相似文献   

Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.  相似文献   


Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride.  相似文献   

The aim of this article is to improve the quality of cookies production by classifying them as good or bad from the curves of resistance of dough observed during the kneading process. As the predictor variable is functional, functional classification methodologies such as functional logit regression and functional discriminant analysis are considered. A P-spline approximation of the sample curves is proposed to improve the classification ability of these models and to suitably estimate the relationship between the quality of cookies and the resistance of dough. Inference results on the functional parameters and related odds ratios are obtained using the asymptotic normality of the maximum likelihood estimators under the classical regularity conditions. Finally, the classification results are compared with alternative functional data analysis approaches such as componentwise classification on the logit regression model.  相似文献   

Multicollinearity or near exact linear dependence among the vectors of regressor variables in a multiple linear regression analysis can have important effects on the quality of least squares parameter estimates. One frequently suggested approach for these problems is principal components regression. This paper investigates alternative variable selection procedures and their implications for such an analysis.  相似文献   

In the framework of redundancy analysis and reduced rank regression, the extended redundancy analysis model managed to account for more than two blocks of manifest variables in its specification. A further extension, the generalized redundancy analysis (GRA), has been recently proposed in literature, with the aim of incorporating external covariates into the model, thanks to a new estimation algorithm that manages to separate all the contributions of the exogenous and external covariates in the formation of the latent composites. At present, software to estimate GRA models is not available. In this paper, we provide an SAS macro, %GRA, to specify and fit structural relationships, with an application to illustrate the use of the macro.  相似文献   

Many different biased regression techniques have been proposed for estimating parameters of a multiple linear regression model when the predictor variables are collinear. One particular alternative, latent root regression analysis, is a technique based on analyzing the latent roots and latent vectors of the correlation matrix of both the response and the predictor variables. It is the purpose of this paper to review the latent root regression estimator and to re-examine some of its properties and applications. It is shown that the latent root estimator is a member of a wider class of estimators for linear models  相似文献   

ABSTRACT. This paper considers a general class of random coefficient regression (RCR) models to represent pooled cross-sectional and time series data. A new method is given to estimate the covariance matrix of the error component in these RCR models. Also, the asymptotic and small sample properties of the estimated generalized least squares estimator of the regression coefficient vector are established. Procedures for testing a linear restriction on the mean vector of the random coefficients are derived. Finally, a test for non-randomness in the RCR model is devised, and the asymptotic distribution of the test statistic is obtained.  相似文献   

Several approaches have been suggested for fitting linear regression models to censored data. These include Cox's propor­tional hazard models based on quasi-likelihoods. Methods of fitting based on least squares and maximum likelihoods have also been proposed. The methods proposed so far all require special purpose optimization routines. We describe an approach here which requires only a modified standard least squares routine.

We present methods for fitting a linear regression model to censored data by least squares and method of maximum likelihood. In the least squares method, the censored values are replaced by their expectations, and the residual sum of squares is minimized. Several variants are suggested in the ways in which the expect­ation is calculated. A parametric (assuming a normal error model) and two non-parametric approaches are described. We also present a method for solving the maximum likelihood equations in the estimation of the regression parameters in the censored regression situation. It is shown that the solutions can be obtained by a recursive algorithm which needs only a least squares routine for optimization. The suggested procesures gain considerably in computational officiency. The Stanford Heart Transplant data is used to illustrate the various methods.  相似文献   

Varying-coefficient models are very useful for longitudinal data analysis. In this paper, we focus on varying-coefficient models for longitudinal data. We develop a new estimation procedure using Cholesky decomposition and profile least squares techniques. Asymptotic normality for the proposed estimators of varying-coefficient functions has been established. Monte Carlo simulation studies show excellent finite-sample performance. We illustrate our methods with a real data example.  相似文献   

This paper examines the use of bootstrapping for bias correction and calculation of confidence intervals (CIs) for a weighted nonlinear quantile regression estimator adjusted to the case of longitudinal data. Different weights and types of CIs are used and compared by computer simulation using a logistic growth function and error terms following an AR(1) model. The results indicate that bias correction reduces the bias of a point estimator but fails for CI calculations. A bootstrap percentile method and a normal approximation method perform well for two weights when used without bias correction. Taking both coverage and lengths of CIs into consideration, a non-bias-corrected percentile method with an unweighted estimator performs best.  相似文献   

This article concerns the analysis of multivariate response data with multi-dimensional covariates. Based on local linear smoothing techniques, we propose an iteratively adaptive estimation method to reduce the dimensions of response variables and covariates. Two weighted estimation strategies are incorporated in our approach to provide initial estimates. Our proposal is also extended to curve response data for a data-adaptive basis function searching. Instead of focusing on goodness of fit, we shift the problem to reveal the data structure and basis patterns. Simulation studies with multivariate response and curve data are conducted for our pairwise directions estimation (PDE) approach in comparison with sliced inverse regression of Li et al. [Dimension reduction for multivariate response data. J Amer Statist Assoc. 2003;98:99–109]. The results demonstrate that the proposed PDE method is useful for data with responses approximating linear or bending structures. Illustrative applications to two real datasets are also presented.  相似文献   

