首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒

In this paper, we investigate the objective function and deflation process for sparse Partial Least Squares (PLS) regression with multiple components. While many have considered variations on the objective for sparse PLS, the deflation process for sparse PLS has not received as much attention. Our work highlights a flaw in the Statistically Inspired Modification of Partial Least Squares (SIMPLS) deflation method when applied in sparse PLS regression. We also consider the Nonlinear Iterative Partial Least Squares (NIPALS) deflation in sparse PLS regression. To remedy the flaw in the SIMPLS method, we propose a new sparse PLS method wherein the direction vectors are constrained to be sparse and lie in a chosen subspace. We give insight into this new PLS procedure and show through examples and simulation studies that the proposed technique can outperform alternative sparse PLS techniques in coefficient estimation. Moreover, our analysis reveals a simple renormalization step that can be used to improve the estimation of sparse PLS direction vectors generated using any convex relaxation method.  相似文献   

NIPALS and SIMPLS algorithms are the most commonly used algorithms for partial least squares analysis. When the number of objects, N, is much larger than the number of explanatory, K, and/or response variables, M, the NIPALS algorithm can be time consuming. Even though the SIMPLS is not as time consuming as the NIPALS and can be preferred over the NIPALS, there are kernel algorithms developed especially for the cases where N is much larger than number of variables. In this study, the NIPALS, SIMPLS and some kernel algorithms have been used to built partial least squares regression model. Their performances have been compared in terms of the total CPU time spent for the calculations of latent variables, leave-one-out cross validation and bootstrap methods. According to the numerical results, one of the kernel algorithms suggested by Dayal and MacGregor (J Chemom 11:73–85, 1997) is the fastest algorithm.  相似文献   

Many different biased regression techniques have been proposed for estimating parameters of a multiple linear regression model when the predictor variables are collinear. One particular alternative, latent root regression analysis, is a technique based on analyzing the latent roots and latent vectors of the correlation matrix of both the response and the predictor variables. It is the purpose of this paper to review the latent root regression estimator and to re-examine some of its properties and applications. It is shown that the latent root estimator is a member of a wider class of estimators for linear models  相似文献   

Latent variable structural models and the partial least-squares (PLS) estimation procedure have found increased interest since being used in the context of customer satisfaction measurement. The well-known property that the estimates of the inner structure model are inconsistent implies biased estimates for finite sample sizes. A simplified version of the structural model that is used for the Swedish Customer Satisfaction Index (SCSI) system has been used to generate simulated data and to study the PLS algorithm in the presence of three inadequacies: (i) skew instead of symmetric distributions for manifest variables; (ii) multi-collinearity within blocks of manifest and between latent variables; and (iii) misspecification of the structural model (omission of regressors). The simulation results show that the PLS method is quite robust against these inadequacies. The bias that is caused by the inconsistency of PLS estimates is substantially increased only for extremely skewed distributions and for the erroneous omission of a highly relevant latent regressor variable. The estimated scores of the latent variables are always in very good agreement with the true values and seem to be unaffected by the inadequacies under investigation.  相似文献   

This article considers both Partial Least Squares (PLS) and Ridge Regression (RR) methods to combat multicollinearity problem. A simulation study has been conducted to compare their performances with respect to Ordinary Least Squares (OLS). With varying degrees of multicollinearity, it is found that both, PLS and RR, estimators produce significant reductions in the Mean Square Error (MSE) and Prediction Mean Square Error (PMSE) over OLS. However, from the simulation study it is evident that the RR performs better when the error variance is large and the PLS estimator achieves its best results when the model includes more variables. However, the advantage of the ridge regression method over PLS is that it can provide the 95% confidence interval for the regression coefficients while PLS cannot.  相似文献   

The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known.  相似文献   

For manifest variables with additive noise and for a given number of latent variables with an assumed distribution, we propose to nonparametrically estimate the association between latent and manifest variables. Our estimation is a two step procedure: first it employs standard factor analysis to estimate the latent variables as theoretical quantiles of the assumed distribution; second, it employs the additive models’ backfitting procedure to estimate the monotone nonlinear associations between latent and manifest variables. The estimated fit may suggest a different latent distribution or point to nonlinear associations. We show on simulated data how, based on mean squared errors, the nonparametric estimation improves on factor analysis. We then employ the new estimator on real data to illustrate its use for exploratory data analysis.  相似文献   

A polynomial functional relationship with errors in both variables can be consistently estimated by constructing an ordinary least squares estimator for the regression coefficients, assuming hypothetically the latent true regressor variable to be known, and then adjusting for the errors. If normality of the error variables can be assumed, the estimator can be simplified considerably. Only the variance of the errors in the regressor variable and its covariance with the errors of the response variable need to be known. If the variance of the errors in the dependent variable is also known, another estimator can be constructed.  相似文献   

The marginal likelihood can be notoriously difficult to compute, and particularly so in high-dimensional problems. Chib and Jeliazkov employed the local reversibility of the Metropolis–Hastings algorithm to construct an estimator in models where full conditional densities are not available analytically. The estimator is free of distributional assumptions and is directly linked to the simulation algorithm. However, it generally requires a sequence of reduced Markov chain Monte Carlo runs which makes the method computationally demanding especially in cases when the parameter space is large. In this article, we study the implementation of this estimator on latent variable models which embed independence of the responses to the observables given the latent variables (conditional or local independence). This property is employed in the construction of a multi-block Metropolis-within-Gibbs algorithm that allows to compute the estimator in a single run, regardless of the dimensionality of the parameter space. The counterpart one-block algorithm is also considered here, by pointing out the difference between the two approaches. The paper closes with the illustration of the estimator in simulated and real-life data sets.  相似文献   

Summary.  Generalized linear latent variable models (GLLVMs), as defined by Bartholomew and Knott, enable modelling of relationships between manifest and latent variables. They extend structural equation modelling techniques, which are powerful tools in the social sciences. However, because of the complexity of the log-likelihood function of a GLLVM, an approximation such as numerical integration must be used for inference. This can limit drastically the number of variables in the model and can lead to biased estimators. We propose a new estimator for the parameters of a GLLVM, based on a Laplace approximation to the likelihood function and which can be computed even for models with a large number of variables. The new estimator can be viewed as an M -estimator, leading to readily available asymptotic properties and correct inference. A simulation study shows its excellent finite sample properties, in particular when compared with a well-established approach such as LISREL. A real data example on the measurement of wealth for the computation of multidimensional inequality is analysed to highlight the importance of the methodology.  相似文献   

Positron emission tomography (PET) imaging can be used to study the effects of pharmacologic intervention on brain function. Partial least squares (PLS) regression is a standard tool that can be applied to characterize such effects throughout the brain volume and across time. We have extended the PLS regression methodology to adjust for covariate effects that may influence spatial and temporal aspects of the functional image data over the brain volume. The extension involves multi-dimensional latent variables, experimental design variables based upon sequential PET scanning, and covariates. An illustration is provided using a sequential PET data set acquired to study the effect of d-amphetamine on cerebral blood flow in baboons. An iterative algorithm is developed and implemented and validation results are provided through computer simulation studies.  相似文献   

Partial least squares regression (PLS) is one method to estimate parameters in a linear model when predictor variables are nearly collinear. One way to characterize PLS is in terms of the scaling (shrinkage or expansion) along each eigenvector of the predictor correlation matrix. This characterization is useful in providing a link between PLS and other shrinkage estimators, such as principal components regression (PCR) and ridge regression (RR), thus facilitating a direct comparison of PLS with these methods. This paper gives a detailed analysis of the shrinkage structure of PLS, and several new results are presented regarding the nature and extent of shrinkage.  相似文献   


In this paper, assuming that there exist omitted variables in the specified model, we analytically derive the exact formula for the mean squared error (MSE) of a heterogeneous pre-test (HPT) estimator whose components are the ordinary least squares (OLS) and feasible ridge regression (FRR) estimators. Since we cannot examine the MSE performance analytically, we execute numerical evaluations to investigate small sample properties of the HPT estimator, and compare the MSE performance of the HPT estimator with those of the FRR estimator and the usual OLS estimator. Our numerical results show that (1) the HPT estimator is more efficient when the model misspecification is severe; (2) the HPT estimator with the optimal critical value obtained under the correctly specified model can be safely used even when there exist omitted variables in the specified model.  相似文献   

Overdispersion has been a common phenomenon in count data and usually treated with the negative binomial model. This paper shows that measurement errors in covariates in general also lead to overdispersion on the observed data if the true data generating process is indeed the Poisson regression. This kind of overdispersion cannot be treated using the negative binomial model, as otherwise, biases will occur. To provide consistent estimates, we propose a new type of corrected score estimator assuming that the distribution of the latent variables is known. The consistency and asymptotic normality of the proposed estimator are established. Simulation results show that this estimator has good finite sample performance. We also illustrate that the Akaike information criterion and Bayesian information criterion work well for selecting the correct model if the true model is the errors-in-variables Poisson regression.  相似文献   

We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator  相似文献   

This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data.  相似文献   

Latent variable models have been widely used for modelling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them.  相似文献   

胡亚南  田茂再 《统计研究》2019,36(1):104-114
零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术, 研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。  相似文献   

Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for high-dimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for high-dimensional data based on rare event methods which we refer to as RE-ABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudo-marginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that RE-ABC has a lower computational cost for high-dimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling.  相似文献   

We propose an easy to derive and simple to compute approximate least squares or maximum likelihood estimator for nonlinear errors-in-variables models that does not require the knowledge of the conditional density of the latent variables given the observables. Specific examples and Monte Carlo studies demonstrate that the bias of this approximate estimator is small even when the magnitude of the variance of measurement errors to the variance of measured covariates is large. Cheng Hsiao and Qing Wang's work was supported in part by National Science Foundation grant SeS91-22481 and SBR94-09540. Liqun Wang gratefully acknowledges the financial support from Swiss National Science Foundation. We wish to thank Professor H. Schneeweiss and a referee for helpful comments and suggestions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号