共查询到20条相似文献,搜索用时 15 毫秒
1.
Principal component analysis (PCA) and functional principal analysis are key tools in multivariate analysis, in particular modelling yield curves, but little attention is given to questions of uncertainty, neither in the components themselves nor in any derived quantities such as scores. Actuaries using PCA to model yield curves to assess interest rate risk for insurance companies are required to show any uncertainty in their calculations. Asymptotic results based on assumptions of multivariate normality are unsatisfactory for modest samples, and application of bootstrap methods is not straightforward, with the novel pitfalls of possible inversions in order of sample components and reversals of signs. We present methods for overcoming these difficulties and discuss arising of other potential hazards. 相似文献
2.
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis. 相似文献
3.
A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients. 相似文献
4.
We consider the problem related to clustering of gamma-ray bursts (from “BATSE” catalogue) through kernel principal component analysis in which our proposed kernel outperforms results of other competent kernels in terms of clustering accuracy and we obtain three physically interpretable groups of gamma-ray bursts. The effectivity of the suggested kernel in combination with kernel principal component analysis in revealing natural clusters in noisy and nonlinear data while reducing the dimension of the data is also explored in two simulated data sets. 相似文献
5.
《Journal of Statistical Computation and Simulation》2012,82(11):2140-2156
ABSTRACTWe propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage. 相似文献
6.
Yukata Tanaka 《统计学通讯:理论与方法》2013,42(9):3157-3175
The problem of detecting influential observations in principalcomponent analysis was discussed by several authors. Radhakrishnan and kshirsagar ( 1981 ), Critchley ( 1985 ), jolliffe ( 1986 )among others discussed this topicby using the influence functions I(X;θs)and I(X;Vs)of eigenvalues and eigenvectors, which wwere derived under the assumption that the eigenvalues of interest were simple. In this paper we propose the influence functionsI(X;∑q s=1θsVsVs T)and I(x;∑q s=1VsVs t)(q<p;p:number of variables) to investigate the influence onthe subspace spanned by principal components. These influence functions are applicable not only to the case where the edigenvalues of interst are all simple but also to the case where there are some multiple eigenvalues among those of interest. 相似文献
7.
《Journal of Statistical Computation and Simulation》2012,82(11):2298-2315
Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique. 相似文献
8.
This paper reviews various treatments of non-metric variables in partial least squares (PLS) and principal component analysis (PCA) algorithms. The performance of different treatments is compared in an extensive simulation study under several typical data generating processes and associated recommendations are made. Moreover, we find that PLS-based methods are to prefer in practice, since, independent of the data generating process, PLS performs either as good as PCA or significantly outperforms it. As an application of PLS and PCA algorithms with non-metric variables we consider construction of a wealth index to predict household expenditures. Consistent with our simulation study, we find that a PLS-based wealth index with dummy coding outperforms PCA-based ones. 相似文献
9.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach. 相似文献
10.
Treating principal component analysis (PCA) and canonical variate analysis (CVA) as methods for approximating tables, we develop measures, collectively termed predictivity, that assess the quality of fit independently for each variable and for all dimensionalities. We illustrate their use with data from aircraft development, the African timber industry and copper froth measurements from the mining industry. Similar measures are described for assessing the predictivity associated with the individual samples (in the case of PCA and CVA) or group means (in the case of CVA). For these measures to be meaningful, certain essential orthogonality conditions must hold that are shown to be satisfied by predictivity. 相似文献
11.
P.-O. Caron 《统计学通讯:模拟与计算》2013,42(7):2110-2117
ABSTRACTParallel analysis (Horn 1965) and the minimum average partial correlation (MAP; Velicer 1976) have been widely spread as optimal solutions to identify the correct number of axes in principal component analysis. Previous results showed, however, that they become inefficient when variables belonging to different components strongly correlate. Simulations are used to assess their power to detect the dimensionality of data sets with oblique structures. Overall, MAP had the best performances as it was more powerful and accurate than PA when the component structure was modestly oblique. However, both stopping rules performed poorly in the presence of highly oblique factors. 相似文献
12.
This paper describes a permutation procedure to test for the equality of selected elements of a covariance or correlation matrix across groups. It involves either centring or standardising each variable within each group before randomly permuting observations between groups. Since the assumption of exchangeability of observations between groups does not strictly hold following such transformations, Monte Carlo simulations were used to compare expected and empirical rejection levels as a function of group size, the number of groups and distribution type (Normal, mixtures of Normals and Gamma with various values of the shape parameter). The Monte Carlo study showed that the estimated probability levels are close to those that would be obtained with an exact test except at very small sample sizes (5 or 10 observations per group). The test appears robust against non-normal data, different numbers of groups or variables per group and unequal sample sizes per group. Power was increased with increasing sample size, effect size and the number of elements in the matrix and power was decreased with increasingly unequal numbers of observations per group. 相似文献
13.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken. 相似文献
14.
Jianghu Dong Haolun Shi Liangliang Wang Ying Zhang Jiguo Cao 《Journal of applied statistics》2023,50(1):43
In many clinical studies, longitudinal biomarkers are often used to monitor the progression of a disease. For example, in a kidney transplant study, the glomerular filtration rate (GFR) is used as a longitudinal biomarker to monitor the progression of the kidney function and the patient''s state of survival that is characterized by multiple time-to-event outcomes, such as kidney transplant failure and death. It is known that the joint modelling of longitudinal and survival data leads to a more accurate and comprehensive estimation of the covariates'' effect. While most joint models use the longitudinal outcome as a covariate for predicting survival, very few models consider the further decomposition of the variation within the longitudinal trajectories and its effect on survival. We develop a joint model that uses functional principal component analysis (FPCA) to extract useful features from the longitudinal trajectories and adopt the competing risk model to handle multiple time-to-event outcomes. The longitudinal trajectories and the multiple time-to-event outcomes are linked via the shared functional features. The application of our model on a real kidney transplant data set reveals the significance of these functional features, and a simulation study is carried out to validate the accurateness of the estimation method. 相似文献
15.
The RV-coefficient (Escoufier, 1973; Robert and Escoufier, 1976) is studied as a sensitivity coefficient of the subspace spanned by dominant eigenvectors in principal component analysis. We use the perturbation expansion up to second order term of the corresponding projection matrix. The relationship with the measures by Benasseni (1990) and Krzanowski (1979) is also discussed. 相似文献
16.
In this study, classical and robust principal component analyses are used to evaluate socioeconomic development of regions of development agencies that give service on the purpose of decreasing development difference among regions in Turkey. Due to the high differences between development levels of regions outlier problem occurs, hence robust statistical methods are used. Also, classical and robust statistical methods are used to investigate if there are any outliers in data set. In classic principal component analyse, the number of observations must be larger than the number of variables. Otherwise determinant of covariance matrix is zero. In Robust method for Principal Component Analysis (ROBPCA), a robust approach to principal component analyse in high-dimensional data, even if the number of variables is larger than the number of observations, principal components are obtained. In this paper, firstly 26 development agencies are evaluated with 19 variables by using principal component analysis based on classical and robust scatter matrices and then these 26 development agencies are evaluated with 46 variables by using the ROBPCA method. 相似文献
17.
18.
H. Fotouhi 《Journal of applied statistics》2012,39(10):2199-2207
Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA. 相似文献
19.
Comparison of some goodness of fit tests for a single non-isotropic hypothetical principal component
James R. Schott 《统计学通讯:理论与方法》2013,42(5):1201-1215
Kshirsagar (1961) proposed a t e s t criterion for the null hypothesis that a covariance matrix with known smaller latent root of mu1tip1icity p?1 has its single non-isotropic principal component in a specified direction. It is shown that the power function of this criterion lacks some desirable properties. Another test criterion is proposed. The case in which the covariance matrix has an unknown smaller latent root of multi-plicity p?1 is also investigated. 相似文献
20.
Giovanni Maria Merola 《Journal of applied statistics》2020,47(8):1325
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets. 相似文献