首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-express a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading components.  相似文献   

2.
ABSTRACT

In this paper, we investigate the objective function and deflation process for sparse Partial Least Squares (PLS) regression with multiple components. While many have considered variations on the objective for sparse PLS, the deflation process for sparse PLS has not received as much attention. Our work highlights a flaw in the Statistically Inspired Modification of Partial Least Squares (SIMPLS) deflation method when applied in sparse PLS regression. We also consider the Nonlinear Iterative Partial Least Squares (NIPALS) deflation in sparse PLS regression. To remedy the flaw in the SIMPLS method, we propose a new sparse PLS method wherein the direction vectors are constrained to be sparse and lie in a chosen subspace. We give insight into this new PLS procedure and show through examples and simulation studies that the proposed technique can outperform alternative sparse PLS techniques in coefficient estimation. Moreover, our analysis reveals a simple renormalization step that can be used to improve the estimation of sparse PLS direction vectors generated using any convex relaxation method.  相似文献   

3.
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets.  相似文献   

4.
Sparse principal components analysis (SPCA) is a technique for finding principal components with a small number of non‐zero loadings. Our contribution to this methodology is twofold. First we derive the sparse solutions that minimise the least squares criterion subject to sparsity requirements. Second, recognising that sparsity is not the only requirement for achieving simplicity, we suggest a backward elimination algorithm that computes sparse solutions with large loadings. This algorithm can be run without specifying the number of non‐zero loadings in advance. It is also possible to impose the requirement that a minimum amount of variance be explained by the components. We give thorough comparisons with existing SPCA methods and present several examples using real datasets.  相似文献   

5.
We propose a flexible functional approach for modelling generalized longitudinal data and survival time using principal components. In the proposed model the longitudinal observations can be continuous or categorical data, such as Gaussian, binomial or Poisson outcomes. We generalize the traditional joint models that treat categorical data as continuous data by using some transformations, such as CD4 counts. The proposed model is data-adaptive, which does not require pre-specified functional forms for longitudinal trajectories and automatically detects characteristic patterns. The longitudinal trajectories observed with measurement error or random error are represented by flexible basis functions through a possibly nonlinear link function, combining dimension reduction techniques resulting from functional principal component (FPC) analysis. The relationship between the longitudinal process and event history is assessed using a Cox regression model. Although the proposed model inherits the flexibility of non-parametric methods, the estimation procedure based on the EM algorithm is still parametric in computation, and thus simple and easy to implement. The computation is simplified by dimension reduction for random coefficients or FPC scores. An iterative selection procedure based on Akaike information criterion (AIC) is proposed to choose the tuning parameters, such as the knots of spline basis and the number of FPCs, so that appropriate degree of smoothness and fluctuation can be addressed. The effectiveness of the proposed approach is illustrated through a simulation study, followed by an application to longitudinal CD4 counts and survival data which were collected in a recent clinical trial to compare the efficiency and safety of two antiretroviral drugs.  相似文献   

6.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

7.
Canonical variate analysis can be viewed as a two-stage principal component analysis. Explicit consideration of the principal components from the first stage, formalized in the content of shrunken estimators, leads to a number of practical advantages. In morphometric studies, the first eigenvector is often a size vector, with the remaining vectors contrast or shape-type vectors, so that a decomposition of the canonical variates into size and shape components can be achieved. In applied studies, often a small number of the principal components effect most of the separation between groups; plots of group means and associated concentration ellipses (ideally these should be circular) for important principal components facilitate graphical inspection. Of considerable practical importance is the potential for improved stability of the estimated canonical vectors. When the between-groups sum of squares for a particular principal component is small, and the corresponding eigenvalue of the within-groups correlation matrix is also small, marked instability of the canonical vectors can be expected. The introduction of shrunken estimators, by adding shrinkage constrants to the eigenvalues, leads to more stable coefficients.  相似文献   

8.
We consider the joint analysis of two matched matrices which have common rows and columns, for example multivariate data observed at two time points or split according to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, correspondence analysis for frequency data, log-ratio analysis of compositional data and linear biplots in general, all of which depend on the singular value decomposition. A simple result in matrix algebra shows that by setting up two matched matrices in a particular block format, matrix sum and difference components can be analysed using a single application of the singular value decomposition algorithm. The methodology is applied to data from the International Social Survey Program comparing male and female attitudes on working wives across eight countries. The resulting biplots optimally display the overall cross-cultural differences as well as the male-female differences. The case of more than two matched matrices is also discussed.  相似文献   

9.
Interpretation of principal components is difficult due to their weights (loadings, coefficients) being of various sizes. Whereas very small weights or very large weights can give clear indication of the importance of particular variables, weights that are neither large nor small (‘grey area’ weights) are problematical. This is a particular problem in the fast moving goods industries where a lot of multivariate panel data are collected on products. These panel data are subjected to univariate analyses and multivariate analyses where principal components (PCs) are key to the interpretation of the data. Several authors have suggested alternatives to PCs, seeking simplified components such as sparse PCs. Here components, termed simple components (SCs), are sought in conjunction with Thurstonian criteria that a component should have only a few variables highly weighted on it and each variable should be weighted heavily on just a few components. An algorithm is presented that finds SCs efficiently. Simple components are found for panel data consisting of the responses to a questionnaire on efficacy and other features of deodorants. It is shown that five SCs can explain an amount of variation within the data comparable to that explained by the PCs, but with easier interpretation.  相似文献   

10.
11.
The local influence method is adapted to testing hypotheses about principal components for investigating the influence of observations on the test statistic. Simultaneous perturbations on all observations are considered. The main diagnostic is the direction vector of the maximum slope of the surface formed by the perturbed test statistic. A perturbation is constructed whose result is the same as that of the influence function method. An example is given for illustration.  相似文献   

12.
Cross-validation has been widely used in the context of statistical linear models and multivariate data analysis. Recently, technological advancements give possibility of collecting new types of data that are in the form of curves. Statistical procedures for analysing these data, which are of infinite dimension, have been provided by functional data analysis. In functional linear regression, using statistical smoothing, estimation of slope and intercept parameters is generally based on functional principal components analysis (FPCA), that allows for finite-dimensional analysis of the problem. The estimators of the slope and intercept parameters in this context, proposed by Hall and Hosseini-Nasab [On properties of functional principal components analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol. 68 (2006), pp. 109–126], are based on FPCA, and depend on a smoothing parameter that can be chosen by cross-validation. The cross-validation criterion, given there, is time-consuming and hard to compute. In this work, we approximate this cross-validation criterion by such another criterion so that we can turn to a multivariate data analysis tool in some sense. Then, we evaluate its performance numerically. We also treat a real dataset, consisting of two variables; temperature and the amount of precipitation, and estimate the regression coefficients for the former variable in a model predicting the latter one.  相似文献   

13.
Projection techniques for nonlinear principal component analysis   总被引:4,自引:0,他引:4  
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest proportion of variance in the data.Nonlinear PCA addresses the nonlinearity problem by relaxing the linear restrictions on standard PCA. We investigate both linear and nonlinear approaches to PCA both exclusively and in combination. In particular we introduce a combination of projection pursuit and nonlinear regression for nonlinear PCA. We compare the success of PCA techniques in variance recovery by applying linear, nonlinear and hybrid methods to some simulated and real data sets.We show that the best linear projection that captures the structure in the data (in the sense that the original data can be reconstructed from the projection) is not necessarily a (linear) principal component. We also show that the ability of certain nonlinear projections to capture data structure is affected by the choice of constraint in the eigendecomposition of a nonlinear transform of the data. Similar success in recovering data structure was observed for both linear and nonlinear projections.  相似文献   

14.
Univariate time series often take the form of a collection of curves observed sequentially over time. Examples of these include hourly ground-level ozone concentration curves. These curves can be viewed as a time series of functions observed at equally spaced intervals over a dense grid. Since functional time series may contain various types of outliers, we introduce a robust functional time series forecasting method to down-weigh the influence of outliers in forecasting. Through a robust principal component analysis based on projection pursuit, a time series of functions can be decomposed into a set of robust dynamic functional principal components and their associated scores. Conditioning on the estimated functional principal components, the crux of the curve-forecasting problem lies in modelling and forecasting principal component scores, through a robust vector autoregressive forecasting method. Via a simulation study and an empirical study on forecasting ground-level ozone concentration, the robust method demonstrates the superior forecast accuracy that dynamic functional principal component regression entails. The robust method also shows the superior estimation accuracy of the parameters in the vector autoregressive models for modelling and forecasting principal component scores, and thus improves curve forecast accuracy.  相似文献   

15.
In this article we investigate the relationship between the EM algorithm and the Gibbs sampler. We show that the approximate rate of convergence of the Gibbs sampler by Gaussian approximation is equal to that of the corresponding EM-type algorithm. This helps in implementing either of the algorithms as improvement strategies for one algorithm can be directly transported to the other. In particular, by running the EM algorithm we know approximately how many iterations are needed for convergence of the Gibbs sampler. We also obtain a result that under certain conditions, the EM algorithm used for finding the maximum likelihood estimates can be slower to converge than the corresponding Gibbs sampler for Bayesian inference. We illustrate our results in a number of realistic examples all based on the generalized linear mixed models.  相似文献   

16.
The effect of nonstationarity in time series columns of input data in principal components analysis is examined. Nonstationarity are very common among economic indicators collected over time. They are subsequently summarized into fewer indices for purposes of monitoring. Due to the simultaneous drifting of the nonstationary time series usually caused by the trend, the first component averages all the variables without necessarily reducing dimensionality. Sparse principal components analysis can be used, but attainment of sparsity among the loadings (hence, dimension-reduction is achieved) is influenced by the choice of parameter(s) (λ 1,i ). Simulated data with more variables than the number of observations and with different patterns of cross-correlations and autocorrelations were used to illustrate the advantages of sparse principal components analysis over ordinary principal components analysis. Sparse component loadings for nonstationary time series data can be achieved provided that appropriate values of λ 1,j are used. We provide the range of values of λ 1,j that will ensure convergence of the sparse principal components algorithm and consequently achieve sparsity of component loadings.  相似文献   

17.
The main topic of the paper is on-line filtering for non-Gaussian dynamic (state space) models by approximate computation of the first two posterior moments using efficient numerical integration. Based on approximating the prior of the state vector by a normal density, we prove that the posterior moments of the state vector are related to the posterior moments of the linear predictor in a simple way. For the linear predictor Gauss-Hermite integration is carried out with automatic reparametrization based on an approximate posterior mode filter. We illustrate how further topics in applied state space modelling, such as estimating hyperparameters, computing model likelihoods and predictive residuals, are managed by integration-based Kalman-filtering. The methodology derived in the paper is applied to on-line monitoring of ecological time series and filtering for small count data.  相似文献   

18.
This paper discusses biplots of the between-set correlation matrix obtained by canonical correlation analysis. It is shown that these biplots can be enriched with the representation of the cases of the original data matrices. A representation of the cases that is optimal in the generalized least squares sense is obtained by the superposition of a scatterplot of the canonical variates on the biplot of the between-set correlation matrix. Goodness of fit statistics for all correlation and data matrices involved in canonical correlation analysis are discussed. It is shown that adequacy and redundancy coefficients are in fact statistics that express the goodness of fit of the original data matrices in the biplot. The within-set correlation matrix that is represented in standard coordinates always has a better goodness of fit than the within-set correlation matrix that is represented in principal coordinates. Given certain scalings, the scalar products between variable vectors approximate correlations better than the cosines of angles between variable vectors. Several data sets are used to illustrate the results.  相似文献   

19.
《统计学通讯:理论与方法》2012,41(13-14):2305-2320
We consider shrinkage and preliminary test estimation strategies for the matrix of regression parameters in multivariate multiple regression model in the presence of a natural linear constraint. We suggest a shrinkage and preliminary test estimation strategies for the parameter matrix. The goal of this article is to critically examine the relative performances of these estimators in the direction of the subspace and candidate subspace restricted type estimators. Our analytical and numerical results show that the proposed shrinkage and preliminary test estimators perform better than the benchmark estimator under candidate subspace and beyond. The methods are also applied on a real data set for illustrative purposes.  相似文献   

20.
Markov chain Monte Carlo techniques have revolutionized the field of Bayesian statistics. Their power is so great that they can even accommodate situations in which the structure of the statistical model itself is uncertain. However, the analysis of such trans-dimensional (TD) models is not easy and available software may lack the flexibility required for dealing with the complexities of real data, often because it does not allow the TD model to be simply part of some bigger model. In this paper we describe a class of widely applicable TD models that can be represented by a generic graphical model, which may be incorporated into arbitrary other graphical structures without significantly affecting the mechanism of inference. We also present a decomposition of the reversible jump algorithm into abstract and problem-specific components, which provides infrastructure for applying the method to all models in the class considered. These developments represent a first step towards a context-free method for implementing TD models that will facilitate their use by applied scientists for the practical exploration of model uncertainty. Our approach makes use of the popular WinBUGS framework as a sampling engine and we illustrate its use via two simple examples in which model uncertainty is a key feature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号