首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 899 毫秒
1.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

2.
We conducted confirmatory factor analysis (CFA) of responses (N=803) to a self‐reported measure of optimism, using full‐information estimation via adaptive quadrature (AQ), an alternative estimation method for ordinal data. We evaluated AQ results in terms of the number of iterations required to achieve convergence, model fit, parameter estimates, standard errors (SE), and statistical significance, across four link‐functions (logit, probit, log‐log, complimentary log‐log) using 3–10 and 20 quadrature points. We compared AQ results with those obtained using maximum likelihood, robust maximum likelihood, and robust diagonally weighted least‐squares estimation. Compared to the other two link‐functions, logit and probit not only produced fit statistics, parameters estimates, SEs, and levels of significance that varied less across numbers of quadrature points, but also fitted the data better and provided larger completely standardised loadings than did maximum likelihood and diagonally weighted least‐squares. Our findings demonstrate the viability of using full‐information AQ to estimate CFA models with real‐world ordinal data.  相似文献   

3.
王斌会 《统计研究》2007,24(8):72-76
传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。  相似文献   

4.
Sparse principal components analysis (SPCA) is a technique for finding principal components with a small number of non‐zero loadings. Our contribution to this methodology is twofold. First we derive the sparse solutions that minimise the least squares criterion subject to sparsity requirements. Second, recognising that sparsity is not the only requirement for achieving simplicity, we suggest a backward elimination algorithm that computes sparse solutions with large loadings. This algorithm can be run without specifying the number of non‐zero loadings in advance. It is also possible to impose the requirement that a minimum amount of variance be explained by the components. We give thorough comparisons with existing SPCA methods and present several examples using real datasets.  相似文献   

5.
To estimate the high-dimensional covariance matrix, row sparsity is often assumed such that each row has a small number of nonzero elements. However, in some applications, such as factor modeling, there may be many nonzero loadings of the common factors. The corresponding variables are also correlated to one another and the rows are non-sparse or dense. This paper has three main aims. First, a detection method is proposed to identify the rows that may be non-sparse, or at least dense with many nonzero elements. These rows are called dense rows and the corresponding variables are called pivotal variables. Second, to determine the number of rows, a ridge ratio method is suggested, which can be regarded as a sure screening procedure. Third, to handle the estimation of high-dimensional factor models, a two-step procedure is suggested with the above screening as the first step. Simulations are conducted to examine the performance of the new method and a real dataset is analyzed for illustration.  相似文献   

6.
We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator  相似文献   

7.
While studying the results from one European Parliament election, the question of principal component analysis (PCA) suitability for this kind of data was raised. Since multiparty data should be seen as compositional data (CD), the application of PCA is inadvisable and may conduct to ineligible results. This work points out the limitations of PCA to CD and presents a practical application to the results from the European Parliament election in 2004. We present a comparative study between the results of PCA, Crude PCA and Logcontrast PCA (Aitchison in: Biometrika 70:57–61, 1983; Kucera, Malmgren in: Marine Micropaleontology 34:117–120, 1998). As a conclusion of this study, and concerning the mentioned data set, the approach which produced clearer results was the Logcontrast PCA. Moreover, Crude PCA conducted to misleading results since nonlinear relations were presented between variables and the linear PCA proved, once again, to be inappropriate to analyse data which can be seen as CD.  相似文献   

8.
We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from penalized Bernoulli likelihood. A Majorization-Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.  相似文献   

9.
A central issue in principal component analysis (PCA) is that of choosing the appropriate number of principal components to be retained. Bishop (1999a) suggested a Bayesian approach for PCA for determining the effective dimensionality automatically on the basis of the probabilistic latent variable model. This paper extends this approach by using mixture priors, in that the choice dimensionality and estimation of principal components are done simultaneously via MCMC algorithm. Also, the proposed method provides a probabilistic measure of uncertainty on PCA, yielding posterior probabilities of all possible cases of principal components.  相似文献   

10.
The effect of nonstationarity in time series columns of input data in principal components analysis is examined. Nonstationarity are very common among economic indicators collected over time. They are subsequently summarized into fewer indices for purposes of monitoring. Due to the simultaneous drifting of the nonstationary time series usually caused by the trend, the first component averages all the variables without necessarily reducing dimensionality. Sparse principal components analysis can be used, but attainment of sparsity among the loadings (hence, dimension-reduction is achieved) is influenced by the choice of parameter(s) (λ 1,i ). Simulated data with more variables than the number of observations and with different patterns of cross-correlations and autocorrelations were used to illustrate the advantages of sparse principal components analysis over ordinary principal components analysis. Sparse component loadings for nonstationary time series data can be achieved provided that appropriate values of λ 1,j are used. We provide the range of values of λ 1,j that will ensure convergence of the sparse principal components algorithm and consequently achieve sparsity of component loadings.  相似文献   

11.
We investigate the effect of measurement error on principal component analysis in the high‐dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error‐induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.  相似文献   

12.
Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA.  相似文献   

13.
In this article, we consider clustering based on principal component analysis (PCA) for high-dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high-dimensional data. First, we derive a geometric representation of high-dimension, low-sample-size (HDLSS) data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets.  相似文献   

14.
科技期刊质量综合评价的主成分分析法及其改进   总被引:1,自引:0,他引:1  
应用主成分分析进行理工大学、工业综合类科技期刊质量综合评价,根据主成分累计贡献值确定主成分的有效维数和权重,消除由于指标间的相关性带来的偏差和人为确定指标权重引起的缺陷,使评价结果更客观、公正和准确。研究了评价指标数、期刊种类数等对评价结果的影响,从而确定了合理的评价指标,得到了可靠、有效的评价结果。在18个指标中,根据保留具有重要作用变量的原则,最终选定14个有效评价指标,并且对期刊质量都具有正向作用,其中引用刊教、学科扩散指标等5个指标最重要,而影响因子的重要性最低。  相似文献   

15.
In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.  相似文献   

16.
Confirmatory factor analysis (CFA) model is a useful multivariate statistical tool for interpreting relationships between latent variables and manifest variables. Often statistical results based on a single CFA are seriously distorted when data set takes on heterogeneity. To address the heterogeneity resulting from the multivariate responses, we propose a Bayesian semiparametric modeling for CFA. The approach relies on using a prior over the space of mixing distributions with finite components. Blocked Gibbs sampler is implemented to cope with the posterior analysis. Results obtained from a simulation study and a real data set are presented to illustrate the methodology.  相似文献   

17.
The case that the factor model does not account for all the covariances of the observed variables is considered. It is shown that principal components representing covariances not accounted for by the factor model can have a nonzero correlation with the common factors of the factor model. The substantial correlations of components representing variance not accounted for by the factor model with common factors are demonstrated in a simulation study comprising model error. Based on these results, a new version of Harman's factor score predictor minimizing the correlation with residual components is proposed.  相似文献   

18.
In the classical principal component analysis (PCA), the empirical influence function for the sensitivity coefficient ρ is used to detect influential observations on the subspace spanned by the dominants principal components. In this article, we derive the influence function of ρ in the case where the reweighted minimum covariance determinant (MCD1) is used as estimator of multivariate location and scatter. Our aim is to confirm the reliability in terms of robustness of the MCD1 via the approach based on the influence function of the sensitivity coefficient.  相似文献   

19.
张波  刘晓倩 《统计研究》2019,36(4):119-128
本文旨在研究基于fused惩罚的稀疏主成分分析方法,以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先,从回归分析角度出发,提出一种求解稀疏主成分的简便思路,给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法,并证明在惩罚函数取1-范数时,该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次,本文提出将fused惩罚与主成分分析相结合,得到一种fused稀疏主成分分析方法,并从惩罚性矩阵分解和回归分析两个角度,给出两种模型形式。在理论上证明了两种模型的求解结果是一致的,故将其统称为FSPCA模型。模拟实验显示,FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后,将FSPCA模型应用于手写数字识别,发现与SPC模型相比,FSPCA模型所提取的主成分具备更好的解释性,这使得该模型更具实用价值。  相似文献   

20.
A crucial issue for principal components analysis (PCA) is to determine the number of principal components to capture the variability of usually high-dimensional data. In this article the dimension detection for PCA is formulated as a variable selection problem for regressions. The adaptive LASSO is used for the variable selection in this application. Simulations demonstrate that this approach is more accurate than existing methods in some cases and competitive in some others. The performance of this model is also illustrated using a real example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号