期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust principal component analysis for compositional tables

J. de Sousa K. Hron K. Fa evicov P. Filzmoser 《Journal of applied statistics》2021,48(2):214

A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients. 相似文献

2.

Multiple imputation for continuous variables using a Bayesian principal component analysis

《Journal of Statistical Computation and Simulation》2012,82(11):2140-2156

ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage. 相似文献

3.

Sensitivity analysis in principal component analysis:influence on the subspace spanned by principal components.

Yukata Tanaka 《统计学通讯:理论与方法》2013,42(9):3157-3175

The problem of detecting influential observations in principalcomponent analysis was discussed by several authors. Radhakrishnan and kshirsagar ( 1981 ), Critchley ( 1985 ), jolliffe ( 1986 )among others discussed this topicby using the influence functions I(X;θ_s)and I(X;V_s)of eigenvalues and eigenvectors, which wwere derived under the assumption that the eigenvalues of interest were simple. In this paper we propose the influence functionsI(X;∑^q _s=1θ_sV_sV_s ^T)and I(x;∑^q _s=1V_sV_s ^t)(q<p;p:number of variables) to investigate the influence onthe subspace spanned by principal components. These influence functions are applicable not only to the case where the edigenvalues of interst are all simple but also to the case where there are some multiple eigenvalues among those of interest. 相似文献

4.

Missing data in principal component analysis of questionnaire data: a comparison of methods

《Journal of Statistical Computation and Simulation》2012,82(11):2298-2315

Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique. 相似文献

5.

A comparison of different procedures for principal component analysis in the presence of outliers

B. Bariş Alkan Cemal Atakan Nesrin Alkan 《Journal of applied statistics》2015,42(8):1716-1722

Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach. 相似文献

6.

Measures of fit in principal component and canonical variate analyses

Sugnet Gardner-Lubbe John C. Gowers 《Journal of applied statistics》2008,35(9):947-965

Treating principal component analysis (PCA) and canonical variate analysis (CVA) as methods for approximating tables, we develop measures, collectively termed predictivity, that assess the quality of fit independently for each variable and for all dimensionalities. We illustrate their use with data from aircraft development, the African timber industry and copper froth measurements from the mining industry. Similar measures are described for assessing the predictivity associated with the individual samples (in the case of PCA and CVA) or group means (in the case of CVA). For these measures to be meaningful, certain essential orthogonality conditions must hold that are shown to be satisfied by predictivity. 相似文献

7.

Minimum average partial correlation and parallel analysis: The influence of oblique structures

P.-O. Caron 《统计学通讯:模拟与计算》2013,42(7):2110-2117

ABSTRACT

Parallel analysis (Horn 1965) and the minimum average partial correlation (MAP; Velicer 1976) have been widely spread as optimal solutions to identify the correct number of axes in principal component analysis. Previous results showed, however, that they become inefficient when variables belonging to different components strongly correlate. Simulations are used to assess their power to detect the dimensionality of data sets with oblique structures. Overall, MAP had the best performances as it was more powerful and accurate than PA when the component structure was modestly oblique. However, both stopping rules performed poorly in the presence of highly oblique factors. 相似文献

8.

A permutation procedure for testing the equality of pattern hypotheses across groups involving correlation or covariance matrices

Shipley Bill 《Statistics and Computing》2000,10(3):253-257

This paper describes a permutation procedure to test for the equality of selected elements of a covariance or correlation matrix across groups. It involves either centring or standardising each variable within each group before randomly permuting observations between groups. Since the assumption of exchangeability of observations between groups does not strictly hold following such transformations, Monte Carlo simulations were used to compare expected and empirical rejection levels as a function of group size, the number of groups and distribution type (Normal, mixtures of Normals and Gamma with various values of the shape parameter). The Monte Carlo study showed that the estimated probability levels are close to those that would be obtained with an exact test except at very small sample sizes (5 or 10 observations per group). The test appears robust against non-normal data, different numbers of groups or variables per group and unequal sample sizes per group. Power was increased with increasing sample size, effect size and the number of elements in the matrix and power was decreased with increasingly unequal numbers of observations per group. 相似文献

9.

Dihedral angles principal geodesic analysis using nonlinear statistics

A. Nodehi A. Heydari 《Journal of applied statistics》2015,42(9):1962-1972

Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken. 相似文献

10.

Jointly modelling multiple transplant outcomes by a competing risk model via functional principal component analysis

Jianghu Dong Haolun Shi Liangliang Wang Ying Zhang Jiguo Cao 《Journal of applied statistics》2023,50(1):43

In many clinical studies, longitudinal biomarkers are often used to monitor the progression of a disease. For example, in a kidney transplant study, the glomerular filtration rate (GFR) is used as a longitudinal biomarker to monitor the progression of the kidney function and the patient''s state of survival that is characterized by multiple time-to-event outcomes, such as kidney transplant failure and death. It is known that the joint modelling of longitudinal and survival data leads to a more accurate and comprehensive estimation of the covariates'' effect. While most joint models use the longitudinal outcome as a covariate for predicting survival, very few models consider the further decomposition of the variation within the longitudinal trajectories and its effect on survival. We develop a joint model that uses functional principal component analysis (FPCA) to extract useful features from the longitudinal trajectories and adopt the competing risk model to handle multiple time-to-event outcomes. The longitudinal trajectories and the multiple time-to-event outcomes are linked via the shared functional features. The application of our model on a real kidney transplant data set reveals the significance of these functional features, and a simulation study is carried out to validate the accurateness of the estimation method. 相似文献

11.

Some comments on escoufier's RV- coefficient as a sensitivity measure in principal component analysis

Eduardo Castaão-Tostado Yutaka Tanaka 《统计学通讯:理论与方法》2013,42(12):4619-4626

The RV-coefficient (Escoufier, 1973; Robert and Escoufier, 1976) is studied as a sensitivity coefficient of the subspace spanned by dominant eigenvectors in principal component analysis. We use the perturbation expansion up to second order term of the corresponding projection matrix. The relationship with the measures by Benasseni (1990) and Krzanowski (1979) is also discussed. 相似文献

12.

Functional principal component analyses of biomedical images as outcome measures

Emma O'Connor Nick Fieller rew Holmes John C. Waterton Edward Ainscow 《Journal of the Royal Statistical Society. Series C, Applied statistics》2010,59(1):57-76

相似文献

13.

Exploring the variability of DNA molecules via principal geodesic analysis on the shape space

H. Fotouhi 《Journal of applied statistics》2012,39(10):2199-2207

Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA. 相似文献

14.

Comparison of some goodness of fit tests for a single non-isotropic hypothetical principal component

James R. Schott 《统计学通讯:理论与方法》2013,42(5):1201-1215

Kshirsagar (1961) proposed a t e s t criterion for the null hypothesis that a covariance matrix with known smaller latent root of mu1tip1icity p?1 has its single non-isotropic principal component in a specified direction. It is shown that the power function of this criterion lacks some desirable properties. Another test criterion is proposed. The case in which the covariance matrix has an unknown smaller latent root of multi-plicity p?1 is also investigated. 相似文献

15.

Hierarchical clustering of variables: a comparison among strategies of analysis

Gabriele Soffritti 《统计学通讯:模拟与计算》2013,42(4):977-999

In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed. 相似文献

16.

科技期刊质量综合评价的主成分分析法及其改进 总被引：1，自引：0，他引：1

楼文高吴雷鸣《统计教育》2010,(5):57-62

应用主成分分析进行理工大学、工业综合类科技期刊质量综合评价,根据主成分累计贡献值确定主成分的有效维数和权重,消除由于指标间的相关性带来的偏差和人为确定指标权重引起的缺陷,使评价结果更客观、公正和准确。研究了评价指标数、期刊种类数等对评价结果的影响,从而确定了合理的评价指标,得到了可靠、有效的评价结果。在18个指标中,根据保留具有重要作用变量的原则,最终选定14个有效评价指标,并且对期刊质量都具有正向作用,其中引用刊教、学科扩散指标等5个指标最重要,而影响因子的重要性最低。相似文献

17.

A Monte Carlo examination of the broken-stick distribution to identify components to retain in principal component analysis

《Journal of Statistical Computation and Simulation》2012,82(12):2405-2410

ABSTRACT

The broken-stick (BS) is a popular stopping rule in ecology to determine the number of meaningful components of principal component analysis. However, its properties have not been systematically investigated. The purpose of the current study is to evaluate its ability to detect the correct dimensionality in a data set and whether it tends to over- or underestimate it. A Monte Carlo protocol was carried out. Two main correlation matrices deemed usual in practice were used with three levels of correlation (0, 0.10 and 0.30) between components (generating oblique structure) and with different sample sizes. Analyses of the population correlation matrices indicated that, for extremely large sample sizes, the BS method could be correct for only one of the six simulated structure. It actually failed to identify the correct dimensionality half the time with orthogonal structures and did even worse with some oblique ones. In harder conditions, results show that the power of the BS decreases as sample size increases: weakening its usefulness in practice. Since the BS method seems unlikely to identify the underlying dimensionality of the data, and given that better stopping rules exist it appears as a poor choice when carrying principal component analysis. 相似文献

18.

Functional principal component analysis via regularized Gaussian basis expansions and its application to unbalanced data

Mitsunori Kayano Sadanori Konishi 《Journal of statistical planning and inference》2009

This paper introduces regularized functional principal component analysis for multidimensional functional data sets, utilizing Gaussian basis functions. An essential point in a functional approach via basis expansions is the evaluation of the matrix for the integral of the product of any two bases (cross-product matrix). Advantages of the use of the Gaussian type of basis functions in the functional approach are that its cross-product matrix can be easily calculated, and it creates a much more flexible instrument for transforming each individual's observation into a functional form. The proposed method is applied to the analysis of three-dimensional (3D) protein structural data that can be referred to as unbalanced data. It is shown that our method extracts useful information from unbalanced data through the application. Numerical experiments are conducted to investigate the effectiveness of our method via Gaussian basis functions, compared to the method based on B-splines. On performing regularized functional principal component analysis with B-splines, we also derive the exact form of its cross-product matrix. The numerical results show that our methodology is superior to the method based on B-splines for unbalanced data. 相似文献

19.

Asymptotic expansions for the distributions of statistics based on a correlation matrix

Sadanori Konishi 《Revue canadienne de statistique》1978,6(1):49-56

An asymptotic expansion is given for the distribution of the α-th largest latent root of a correlation matrix, when the observations are from a multivariate normal distribution. An asymptotic expansion for the distribution of a test statistic based on a correlation matrix, which is useful in dimensionality reduction in principal component analysis, is also given. These expansions hold when the corresponding latent root of the population correlation matrix is simple. The approach here is based on a perturbation method. 相似文献

20.

Smoothed functional canonical correlation analysis of humidity and temperature data

Istem Koymen Keser Ipek Deveci Kocakoç 《Journal of applied statistics》2015,42(10):2126-2140

This paper focuses on smoothed functional canonical correlation analysis (SFCCA) to investigate the relationships and changes in large, seasonal and long-term data sets. The aim of this study is to introduce a guideline for SFCCA for functional data and to give some insights on the fine tuning of the methodology for long-term periodical data. The guidelines are applied on temperature and humidity data for 11 years between 2000 and 2010 and the results are interpreted. Seasonal changes or periodical shifts are visually studied by yearly comparisons. The effects of the ‘number of basis functions’ and the ‘selection of smoothing parameter’ on the general variability structure and on correlations between the curves are examined. It is concluded that the number of time points (knots), number of basis functions and the time span of evaluation (monthly, daily, etc.) should all be chosen harmoniously. It is found that changing the smoothing parameter does not have a significant effect on the structure of curves and correlations. The number of basis functions is found to be the main effector on both individual and correlation weight functions. 相似文献