期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probabilistic Principal Component Analysis 总被引：2，自引：0，他引：2

Michael E. Tipping & Christopher M. Bishop 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(3):611-622

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA. 相似文献

2.

Confirmatory Factor Analysis of Ordinal Data Using Full‐Information Adaptive Quadrature

下载免费PDF全文

Fred B. Bryant Karl G. Jöreskog 《Australian & New Zealand Journal of Statistics》2016,58(2):173-196

We conducted confirmatory factor analysis (CFA) of responses (N=803) to a self‐reported measure of optimism, using full‐information estimation via adaptive quadrature (AQ), an alternative estimation method for ordinal data. We evaluated AQ results in terms of the number of iterations required to achieve convergence, model fit, parameter estimates, standard errors (SE), and statistical significance, across four link‐functions (logit, probit, log‐log, complimentary log‐log) using 3–10 and 20 quadrature points. We compared AQ results with those obtained using maximum likelihood, robust maximum likelihood, and robust diagonally weighted least‐squares estimation. Compared to the other two link‐functions, logit and probit not only produced fit statistics, parameters estimates, SEs, and levels of significance that varied less across numbers of quadrature points, but also fitted the data better and provided larger completely standardised loadings than did maximum likelihood and diagonally weighted least‐squares. Our findings demonstrate the viability of using full‐information AQ to estimate CFA models with real‐world ordinal data. 相似文献

3.

稳健主成分分析方法研究及其在经济管理中的应用

下载免费PDF全文

王斌会《统计研究》2007,24(8):72-76

传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。相似文献

4.

Least Squares Sparse Principal Component Analysis: A Backward Elimination Approach to Attain Large Loadings

下载免费PDF全文

Giovanni Maria Merola 《Australian & New Zealand Journal of Statistics》2015,57(3):391-429

Sparse principal components analysis (SPCA) is a technique for finding principal components with a small number of non‐zero loadings. Our contribution to this methodology is twofold. First we derive the sparse solutions that minimise the least squares criterion subject to sparsity requirements. Second, recognising that sparsity is not the only requirement for achieving simplicity, we suggest a backward elimination algorithm that computes sparse solutions with large loadings. This algorithm can be run without specifying the number of non‐zero loadings in advance. It is also possible to impose the requirement that a minimum amount of variance be explained by the components. We give thorough comparisons with existing SPCA methods and present several examples using real datasets. 相似文献

5.

Pivotal variable detection of the covariance matrix and its application to high-dimensional factor models

Junlong Zhao Hongyu Zhao Lixing Zhu 《Statistics and Computing》2018,28(4):775-793

To estimate the high-dimensional covariance matrix, row sparsity is often assumed such that each row has a small number of nonzero elements. However, in some applications, such as factor modeling, there may be many nonzero loadings of the common factors. The corresponding variables are also correlated to one another and the rows are non-sparse or dense. This paper has three main aims. First, a detection method is proposed to identify the rows that may be non-sparse, or at least dense with many nonzero elements. These rows are called dense rows and the corresponding variables are called pivotal variables. Second, to determine the number of rows, a ridge ratio method is suggested, which can be regarded as a sure screening procedure. Third, to handle the estimation of high-dimensional factor models, a two-step procedure is suggested with the above screening as the first step. Simulations are conducted to examine the performance of the new method and a real dataset is analyzed for illustration. 相似文献

6.

Partial Least Squares: A First-order Analysis

Petre Stoica & Torsten Söderström 《Scandinavian Journal of Statistics》1998,25(1):17-24

We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator 相似文献

7.

Analysis of an European union election using principal component analysis

Paulo C. Rodrigues Ana T. Lima 《Statistical Papers》2009,50(4):895-904

While studying the results from one European Parliament election, the question of principal component analysis (PCA) suitability for this kind of data was raised. Since multiparty data should be seen as compositional data (CD), the application of PCA is inadvisable and may conduct to ineligible results. This work points out the limitations of PCA to CD and presents a practical application to the results from the European Parliament election in 2004. We present a comparative study between the results of PCA, Crude PCA and Logcontrast PCA (Aitchison in: Biometrika 70:57–61, 1983; Kucera, Malmgren in: Marine Micropaleontology 34:117–120, 1998). As a conclusion of this study, and concerning the mentioned data set, the approach which produced clearer results was the Logcontrast PCA. Moreover, Crude PCA conducted to misleading results since nonlinear relations were presented between variables and the linear PCA proved, once again, to be inappropriate to analyse data which can be seen as CD. 相似文献

8.

SPARSE LOGISTIC PRINCIPAL COMPONENTS ANALYSIS FOR BINARY DATA

Lee S Huang JZ Hu J 《The annals of applied statistics》2010,4(3):1579-1601

We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from penalized Bernoulli likelihood. A Majorization-Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study. 相似文献

9.

Bayesian principal component analysis with mixture priors

Hyun Sook Oh Dai-Gyoung Kim 《Journal of the Korean Statistical Society》2010,39(3):387-396

A central issue in principal component analysis (PCA) is that of choosing the appropriate number of principal components to be retained. Bishop (1999a) suggested a Bayesian approach for PCA for determining the effective dimensionality automatically on the basis of the probabilistic latent variable model. This paper extends this approach by using mixture priors, in that the choice dimensionality and estimation of principal components are done simultaneously via MCMC algorithm. Also, the proposed method provides a probabilistic measure of uncertainty on PCA, yielding posterior probabilities of all possible cases of principal components. 相似文献

10.

Principal components analysis of nonstationary time series data

Joseph Ryan G. Lansangan Erniel B. Barrios 《Statistics and Computing》2009,19(2):173-187

The effect of nonstationarity in time series columns of input data in principal components analysis is examined. Nonstationarity are very common among economic indicators collected over time. They are subsequently summarized into fewer indices for purposes of monitoring. Due to the simultaneous drifting of the nonstationary time series usually caused by the trend, the first component averages all the variables without necessarily reducing dimensionality. Sparse principal components analysis can be used, but attainment of sparsity among the loadings (hence, dimension-reduction is achieved) is influenced by the choice of parameter(s) (λ _1,i). Simulated data with more variables than the number of observations and with different patterns of cross-correlations and autocorrelations were used to illustrate the advantages of sparse principal components analysis over ordinary principal components analysis. Sparse component loadings for nonstationary time series data can be achieved provided that appropriate values of λ _1,j are used. We provide the range of values of λ _1,j that will ensure convergence of the sparse principal components algorithm and consequently achieve sparsity of component loadings. 相似文献

11.

The Impact of Measurement Error on Principal Component Analysis

Kristoffer Herland Hellton Magne Thoresen 《Scandinavian Journal of Statistics》2014,41(4):1051-1063

We investigate the effect of measurement error on principal component analysis in the high‐dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error‐induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues. 相似文献

12.

Exploring the variability of DNA molecules via principal geodesic analysis on the shape space

H. Fotouhi 《Journal of applied statistics》2012,39(10):2199-2207

Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA. 相似文献

13.

Geometric consistency of principal component scores for high-dimensional mixture models and its application

Kazuyoshi Yata Makoto Aoshima 《Scandinavian Journal of Statistics》2020,47(3):899-921

In this article, we consider clustering based on principal component analysis (PCA) for high-dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high-dimensional data. First, we derive a geometric representation of high-dimension, low-sample-size (HDLSS) data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets. 相似文献

14.

科技期刊质量综合评价的主成分分析法及其改进 总被引：1，自引：0，他引：1

楼文高吴雷鸣《统计教育》2010,(5):57-62

应用主成分分析进行理工大学、工业综合类科技期刊质量综合评价,根据主成分累计贡献值确定主成分的有效维数和权重,消除由于指标间的相关性带来的偏差和人为确定指标权重引起的缺陷,使评价结果更客观、公正和准确。研究了评价指标数、期刊种类数等对评价结果的影响,从而确定了合理的评价指标,得到了可靠、有效的评价结果。在18个指标中,根据保留具有重要作用变量的原则,最终选定14个有效评价指标,并且对期刊质量都具有正向作用,其中引用刊教、学科扩散指标等5个指标最重要,而影响因子的重要性最低。相似文献

15.

Robust PCA for high-dimensional data based on characteristic transformation

Lingyu He Yanrong Yang Bo Zhang 《Australian & New Zealand Journal of Statistics》2023,65(2):127-151

In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA. 相似文献

16.

Bayesian analysis for confirmatory factor model with finite-dimensional Dirichlet prior mixing

Xia Yemao Pan Maolin 《统计学通讯:理论与方法》2017,46(9):4599-4619

Confirmatory factor analysis (CFA) model is a useful multivariate statistical tool for interpreting relationships between latent variables and manifest variables. Often statistical results based on a single CFA are seriously distorted when data set takes on heterogeneity. To address the heterogeneity resulting from the multivariate responses, we propose a Bayesian semiparametric modeling for CFA. The approach relies on using a prior over the space of mixing distributions with finite components. Blocked Gibbs sampler is implemented to cope with the posterior analysis. Results obtained from a simulation study and a real data set are presented to illustrate the methodology. 相似文献

17.

On the Correlation of Common Factors with Variance Not Accounted for by the Factor Model

André Beauducel Norbert Hilger 《统计学通讯:模拟与计算》2016,45(6):2145-2157

The case that the factor model does not account for all the covariances of the observed variables is considered. It is shown that principal components representing covariances not accounted for by the factor model can have a nonzero correlation with the common factors of the factor model. The substantial correlations of components representing variance not accounted for by the factor model with common factors are demonstrated in a simulation study comprising model error. Based on these results, a new version of Harman's factor score predictor minimizing the correlation with residual components is proposed. 相似文献

18.

Sensitivity Coefficient in Principal Component Analysis: Robust Case

Malika Cheikh 《统计学通讯:模拟与计算》2013,42(8):1622-1630

In the classical principal component analysis (PCA), the empirical influence function for the sensitivity coefficient ρ is used to detect influential observations on the subspace spanned by the dominants principal components. In this article, we derive the influence function of ρ in the case where the reweighted minimum covariance determinant (MCD¹) is used as estimator of multivariate location and scatter. Our aim is to confirm the reliability in terms of robustness of the MCD¹ via the approach based on the influence function of the sensitivity coefficient. 相似文献

19.

基于fused惩罚的稀疏主成分分析

张波刘晓倩《统计研究》2019,36(4):119-128

本文旨在研究基于fused惩罚的稀疏主成分分析方法，以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先，从回归分析角度出发，提出一种求解稀疏主成分的简便思路，给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法，并证明在惩罚函数取1-范数时，该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次，本文提出将fused惩罚与主成分分析相结合，得到一种fused稀疏主成分分析方法，并从惩罚性矩阵分解和回归分析两个角度，给出两种模型形式。在理论上证明了两种模型的求解结果是一致的，故将其统称为FSPCA模型。模拟实验显示，FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后，将FSPCA模型应用于手写数字识别，发现与SPC模型相比，FSPCA模型所提取的主成分具备更好的解释性，这使得该模型更具实用价值。相似文献

20.

Detecting the Dimensionality for Principal Components Model

Liuxia Wang Yulin Li 《统计学通讯:模拟与计算》2013,42(6):1073-1082

A crucial issue for principal components analysis (PCA) is to determine the number of principal components to capture the variability of usually high-dimensional data. In this article the dimension detection for PCA is formulated as a variable selection problem for regressions. The adaptive LASSO is used for the variable selection in this application. Simulations demonstrate that this approach is more accurate than existing methods in some cases and competitive in some others. The performance of this model is also illustrated using a real example. 相似文献