首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Projection techniques for nonlinear principal component analysis   总被引:4,自引:0,他引:4  
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest proportion of variance in the data.Nonlinear PCA addresses the nonlinearity problem by relaxing the linear restrictions on standard PCA. We investigate both linear and nonlinear approaches to PCA both exclusively and in combination. In particular we introduce a combination of projection pursuit and nonlinear regression for nonlinear PCA. We compare the success of PCA techniques in variance recovery by applying linear, nonlinear and hybrid methods to some simulated and real data sets.We show that the best linear projection that captures the structure in the data (in the sense that the original data can be reconstructed from the projection) is not necessarily a (linear) principal component. We also show that the ability of certain nonlinear projections to capture data structure is affected by the choice of constraint in the eigendecomposition of a nonlinear transform of the data. Similar success in recovering data structure was observed for both linear and nonlinear projections.  相似文献   

2.
Functional principal component analysis is one of the most commonly employed approaches in functional and longitudinal data analysis and we extend it to analyze functional/longitudinal data observed on a general d-dimensional domain. The computational issues emerging in the extension are fully addressed with our proposed solutions. The local linear smoothing technique is employed to perform estimation because of its capabilities of performing large-scale smoothing and of handling data with different sampling schemes (possibly on irregular domain) in addition to its nice theoretical properties. Besides taking the fast Fourier transform strategy in smoothing, the modern GPGPU (general-purpose computing on graphics processing units) architecture is applied to perform parallel computation to save computation time. To resolve the out-of-memory issue due to large-scale data, the random projection procedure is applied in the eigendecomposition step. We show that the proposed estimators can achieve the classical nonparametric rates for longitudinal data and the optimal convergence rates for functional data if the number of observations per sample is of the order \((n/ \log n)^{d/4}\). Finally, the performance of our approach is demonstrated with simulation studies and the fine particulate matter (PM 2.5) data measured in Taiwan.  相似文献   

3.
AStA Advances in Statistical Analysis - Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible...  相似文献   

4.
This work is concerned with robustness in Principal Component Analysis (PCA). The approach, which we adopt here, is to replace the criterion of least squares by another criterion based on a convex and sufficiently differentiable loss function ρ. Using this criterion we propose a robust estimate of the location vector and introduce an orthogonality with respect to (w.r.t.) ρ in order to define the different steps of a PCA. The influence functions of a vector mean and principal vectors are developed in order to provide method for obtaining a robust PCA. The practical procedure is based on an alternative-steps algorithm.  相似文献   

5.
In functional linear regression, one conventional approach is to first perform functional principal component analysis (FPCA) on the functional predictor and then use the first few leading functional principal component (FPC) scores to predict the response variable. The leading FPCs estimated by the conventional FPCA stand for the major source of variation of the functional predictor, but these leading FPCs may not be mostly correlated with the response variable, so the prediction accuracy of the functional linear regression model may not be optimal. In this paper, we propose a supervised version of FPCA by considering the correlation of the functional predictor and response variable. It can automatically estimate leading FPCs, which represent the major source of variation of the functional predictor and are simultaneously correlated with the response variable. Our supervised FPCA method is demonstrated to have a better prediction accuracy than the conventional FPCA method by using one real application on electroencephalography (EEG) data and three carefully designed simulation studies.  相似文献   

6.
In this paper, a new method for robust principal component analysis (PCA) is proposed. PCA is a widely used tool for dimension reduction without substantial loss of information. However, the classical PCA is vulnerable to outliers due to its dependence on the empirical covariance matrix. To avoid such weakness, several alternative approaches based on robust scatter matrix were suggested. A popular choice is ROBPCA that combines projection pursuit ideas with robust covariance estimation via variance maximization criterion. Our approach is based on the fact that PCA can be formulated as a regression-type optimization problem, which is the main difference from the previous approaches. The proposed robust PCA is derived by substituting square loss function with a robust penalty function, Huber loss function. A practical algorithm is proposed in order to implement an optimization computation, and furthermore, convergence properties of the algorithm are investigated. Results from a simulation study and a real data example demonstrate the promising empirical properties of the proposed method.  相似文献   

7.
Principal component analysis (PCA) and functional principal analysis are key tools in multivariate analysis, in particular modelling yield curves, but little attention is given to questions of uncertainty, neither in the components themselves nor in any derived quantities such as scores. Actuaries using PCA to model yield curves to assess interest rate risk for insurance companies are required to show any uncertainty in their calculations. Asymptotic results based on assumptions of multivariate normality are unsatisfactory for modest samples, and application of bootstrap methods is not straightforward, with the novel pitfalls of possible inversions in order of sample components and reversals of signs. We present methods for overcoming these difficulties and discuss arising of other potential hazards.  相似文献   

8.
A central issue in principal component analysis (PCA) is that of choosing the appropriate number of principal components to be retained. Bishop (1999a) suggested a Bayesian approach for PCA for determining the effective dimensionality automatically on the basis of the probabilistic latent variable model. This paper extends this approach by using mixture priors, in that the choice dimensionality and estimation of principal components are done simultaneously via MCMC algorithm. Also, the proposed method provides a probabilistic measure of uncertainty on PCA, yielding posterior probabilities of all possible cases of principal components.  相似文献   

9.
Advances in data collection and storage have tremendously increased the presence of functional data, whose graphical representations are curves, images or shapes. As a new area of statistics, functional data analysis extends existing methodologies and theories from the realms of functional analysis, generalized linear model, multivariate data analysis, nonparametric statistics, regression models and many others. From both methodological and practical viewpoints, this paper provides a review of functional principal component analysis, and its use in explanatory analysis, modeling and forecasting, and classification of functional data.  相似文献   

10.
A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.  相似文献   

11.
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis.  相似文献   

12.
This paper focuses on the analysis of spatially correlated functional data. We propose a parametric model for spatial correlation and the between-curve correlation is modeled by correlating functional principal component scores of the functional data. Additionally, in the sparse observation framework, we propose a novel approach of spatial principal analysis by conditional expectation to explicitly estimate spatial correlations and reconstruct individual curves. Assuming spatial stationarity, empirical spatial correlations are calculated as the ratio of eigenvalues of the smoothed covariance surface Cov\((X_i(s),X_i(t))\) and cross-covariance surface Cov\((X_i(s), X_j(t))\) at locations indexed by i and j. Then a anisotropy Matérn spatial correlation model is fitted to empirical correlations. Finally, principal component scores are estimated to reconstruct the sparsely observed curves. This framework can naturally accommodate arbitrary covariance structures, but there is an enormous reduction in computation if one can assume the separability of temporal and spatial components. We demonstrate the consistency of our estimates and propose hypothesis tests to examine the separability as well as the isotropy effect of spatial correlation. Using simulation studies, we show that these methods have some clear advantages over existing methods of curve reconstruction and estimation of model parameters.  相似文献   

13.
Classification procedures are examined in the case when the dimensionality exceeds the sample size. Two particular suggestions are (i) Principal components analysis and (ii) Two-step discriminant analysis. Comparisons are made in the two sample and the several sample cases. Extensions to growth curve model are investigated using the two stage discriminant analysis.  相似文献   

14.
The problem of detecting influential observations in principalcomponent analysis was discussed by several authors. Radhakrishnan and kshirsagar ( 1981 ), Critchley ( 1985 ), jolliffe ( 1986 )among others discussed this topicby using the influence functions I(X;θs)and I(X;Vs)of eigenvalues and eigenvectors, which wwere derived under the assumption that the eigenvalues of interest were simple. In this paper we propose the influence functionsI(X;∑q s=1θsVsVs T)and I(x;∑q s=1VsVs t)(q<p;p:number of variables) to investigate the influence onthe subspace spanned by principal components. These influence functions are applicable not only to the case where the edigenvalues of interst are all simple but also to the case where there are some multiple eigenvalues among those of interest.  相似文献   

15.
We consider the problem related to clustering of gamma-ray bursts (from “BATSE” catalogue) through kernel principal component analysis in which our proposed kernel outperforms results of other competent kernels in terms of clustering accuracy and we obtain three physically interpretable groups of gamma-ray bursts. The effectivity of the suggested kernel in combination with kernel principal component analysis in revealing natural clusters in noisy and nonlinear data while reducing the dimension of the data is also explored in two simulated data sets.  相似文献   

16.
In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.  相似文献   

17.
DNA methylation is an epigenetic modification that plays an important role in many biological processes and diseases. Several statistical methods have been proposed to test for DNA methylation differences between conditions at individual cytosine sites, followed by a post hoc aggregation procedure to explore regional differences. While there are benefits to analyzing CpGs individually, there are both biological and statistical reasons to test entire genomic regions for differential methylation. Variability in methylation levels measured by Next-Generation Sequencing (NGS) is often observed across CpG sites in a genomic region. Evaluating meaningful changes in regional level methylation profiles between conditions over noisy site-level measurements is often difficult to implement with parametric models. To overcome these limitations, this study develops a nonparametric approach to detect predefined differentially methylated regions (DMR) based on functional principal component analysis (FPCA). The performance of this approach is compared with two alternative methods (GIFT and M3D), using real and simulated data.KEYWORDS: Functional principal component, epigenetics, DNA methylation, next-generation sequencing  相似文献   

18.
A number of results have been derived recently concerning the influence of individual observations in a principal component analysis. Some of these results, particularly those based on the correlation matrix, are applied to data consisting of seven anatomical measurements on students. The data have a correlation structure which is fairly typical of many found in allometry. This case study shows that theoretical influence functions often provide good estimates of the actual changes observed when individual observations are deleted from a principal component analysis. Different observations may be influential for different aspects of the principal component analysis (coefficients, variances and scores of principal components); these differences, and the distinction between outlying and influential observations are discussed in the context of the case study. A number of other complications, such as switching and rotation of principal components when an observation is deleted, are also illustrated.  相似文献   

19.
While studying the results from one European Parliament election, the question of principal component analysis (PCA) suitability for this kind of data was raised. Since multiparty data should be seen as compositional data (CD), the application of PCA is inadvisable and may conduct to ineligible results. This work points out the limitations of PCA to CD and presents a practical application to the results from the European Parliament election in 2004. We present a comparative study between the results of PCA, Crude PCA and Logcontrast PCA (Aitchison in: Biometrika 70:57–61, 1983; Kucera, Malmgren in: Marine Micropaleontology 34:117–120, 1998). As a conclusion of this study, and concerning the mentioned data set, the approach which produced clearer results was the Logcontrast PCA. Moreover, Crude PCA conducted to misleading results since nonlinear relations were presented between variables and the linear PCA proved, once again, to be inappropriate to analyse data which can be seen as CD.  相似文献   

20.
Principal component regression (PCR) has two steps: estimating the principal components and performing the regression using these components. These steps generally are performed sequentially. In PCR, a crucial issue is the selection of the principal components to be included in regression. In this paper, we build a hierarchical probabilistic PCR model with a dynamic component selection procedure. A latent variable is introduced to select promising subsets of components based upon the significance of the relationship between the response variable and principal components in the regression step. We illustrate this model using real and simulated examples. The simulations demonstrate that our approach outperforms some existing methods in terms of root mean squared error of the regression coefficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号