首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Local influence on the eigenvalues of sample covariance matrices in

principal components analysis is examined for a reasonable modification of Shi's (1997) perturbation scheme, The modification is suggested for samples from populations with both unknown mean vector and covariance matrix. While Shi's detection indexes (1997) consist of only quadratic terms, the modified perturbation scheme leads to detection indexes constituted by both linear and quadratic terms associated with centralized observations. These linear and quadratic terms reflect local influences on the first two sample moments. Examples are investigated based on the two detection indexes.  相似文献   

2.
The RV-coefficient (Escoufier, 1973; Robert and Escoufier, 1976) is studied as a sensitivity coefficient of the subspace spanned by dominant eigenvectors in principal component analysis. We use the perturbation expansion up to second order term of the corresponding projection matrix. The relationship with the measures by Benasseni (1990) and Krzanowski (1979) is also discussed.  相似文献   

3.
Jeanne fine 《Statistics》2013,47(3):401-414
The perturbation methods and the Taylor expansions are very often used to obtain test statistics approximations in multivariate analysis (Specially in Principal Component and Canonical Analyses). These approximations are then used to obtain formal Edgeworth expransions of the distribution functions of the statistics. BHATTACHARYA and GHOSH 1978 have justified these practices under suitable assumptions. In this paper a non classical perturbation problem is solved in order to obtain almost surely expansions of test statistics  相似文献   

4.
Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.  相似文献   

5.
This paper introduces regularized functional principal component analysis for multidimensional functional data sets, utilizing Gaussian basis functions. An essential point in a functional approach via basis expansions is the evaluation of the matrix for the integral of the product of any two bases (cross-product matrix). Advantages of the use of the Gaussian type of basis functions in the functional approach are that its cross-product matrix can be easily calculated, and it creates a much more flexible instrument for transforming each individual's observation into a functional form. The proposed method is applied to the analysis of three-dimensional (3D) protein structural data that can be referred to as unbalanced data. It is shown that our method extracts useful information from unbalanced data through the application. Numerical experiments are conducted to investigate the effectiveness of our method via Gaussian basis functions, compared to the method based on B-splines. On performing regularized functional principal component analysis with B-splines, we also derive the exact form of its cross-product matrix. The numerical results show that our methodology is superior to the method based on B-splines for unbalanced data.  相似文献   

6.
We consider the problem related to clustering of gamma-ray bursts (from “BATSE” catalogue) through kernel principal component analysis in which our proposed kernel outperforms results of other competent kernels in terms of clustering accuracy and we obtain three physically interpretable groups of gamma-ray bursts. The effectivity of the suggested kernel in combination with kernel principal component analysis in revealing natural clusters in noisy and nonlinear data while reducing the dimension of the data is also explored in two simulated data sets.  相似文献   

7.
基于非线性主成分和聚类分析的综合评价方法   总被引:1,自引:0,他引:1  
针对传统主成分在处理非线性问题上的不足,阐述了传统方法在数据无量纲化中“中心标准化”的缺点和处理“线性”数据时的缺陷,给出了数据无量纲化和处理“非线性”数据时的改进方法,并建立了一种基于“对数中心化”的非线性主成分分析和聚类分析的新的综合评价方法。实验表明,该方法能有效地处理非线性数据。  相似文献   

8.
An asymptotic expansion is given for the distribution of the α-th largest latent root of a correlation matrix, when the observations are from a multivariate normal distribution. An asymptotic expansion for the distribution of a test statistic based on a correlation matrix, which is useful in dimensionality reduction in principal component analysis, is also given. These expansions hold when the corresponding latent root of the population correlation matrix is simple. The approach here is based on a perturbation method.  相似文献   

9.
Influence functions are commonly used as diagnostic tools in order to investigate sensitivity aspects in principal component analysis. This paper suggests a practical alternative for the eigenvalues by introducing a sensitivity measure derived from the classical Lorenz curve and associated Gini index. The results are illustrated by analysing an example.  相似文献   

10.
We investigate the effect of measurement error on principal component analysis in the high‐dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error‐induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.  相似文献   

11.
Principal components are a well established tool in dimension reduction. The extension to principal curves allows for general smooth curves which pass through the middle of a multidimensional data cloud. In this paper local principal curves are introduced, which are based on the localization of principal component analysis. The proposed algorithm is able to identify closed curves as well as multiple curves which may or may not be connected. For the evaluation of the performance of principal curves as tool for data reduction a measure of coverage is suggested. By use of simulated and real data sets the approach is compared to various alternative concepts of principal curves.  相似文献   

12.
Univariate time series often take the form of a collection of curves observed sequentially over time. Examples of these include hourly ground-level ozone concentration curves. These curves can be viewed as a time series of functions observed at equally spaced intervals over a dense grid. Since functional time series may contain various types of outliers, we introduce a robust functional time series forecasting method to down-weigh the influence of outliers in forecasting. Through a robust principal component analysis based on projection pursuit, a time series of functions can be decomposed into a set of robust dynamic functional principal components and their associated scores. Conditioning on the estimated functional principal components, the crux of the curve-forecasting problem lies in modelling and forecasting principal component scores, through a robust vector autoregressive forecasting method. Via a simulation study and an empirical study on forecasting ground-level ozone concentration, the robust method demonstrates the superior forecast accuracy that dynamic functional principal component regression entails. The robust method also shows the superior estimation accuracy of the parameters in the vector autoregressive models for modelling and forecasting principal component scores, and thus improves curve forecast accuracy.  相似文献   

13.
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis.  相似文献   

14.
Principal component analysis is a popular dimension reduction technique often used to visualize high‐dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high dimensionality will cause biased or inconsistent eigenvector estimates, but in practice, the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high‐dimensional data, even when there are few observations compared with the number of variables. Our argument is twofold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus, the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high‐dimensional situations.  相似文献   

15.
This work is devoted to robust principal component analysis (PCA). We give a comparison between some multivariate estimators of location and scatter by computing the influence functions of the sensitivity coefficient ρ corresponding to these estimators, and the mean squared error (MSE) of estimators of ρ. The coefficient ρ measures the closeness between the subspaces spanned by the initial eigenvectors and their corresponding version derived from an infinitesimal perturbation of the data distribution.  相似文献   

16.
Principal component regression (PCR) has two steps: estimating the principal components and performing the regression using these components. These steps generally are performed sequentially. In PCR, a crucial issue is the selection of the principal components to be included in regression. In this paper, we build a hierarchical probabilistic PCR model with a dynamic component selection procedure. A latent variable is introduced to select promising subsets of components based upon the significance of the relationship between the response variable and principal components in the regression step. We illustrate this model using real and simulated examples. The simulations demonstrate that our approach outperforms some existing methods in terms of root mean squared error of the regression coefficient.  相似文献   

17.
In this study, classical and robust principal component analyses are used to evaluate socioeconomic development of regions of development agencies that give service on the purpose of decreasing development difference among regions in Turkey. Due to the high differences between development levels of regions outlier problem occurs, hence robust statistical methods are used. Also, classical and robust statistical methods are used to investigate if there are any outliers in data set. In classic principal component analyse, the number of observations must be larger than the number of variables. Otherwise determinant of covariance matrix is zero. In Robust method for Principal Component Analysis (ROBPCA), a robust approach to principal component analyse in high-dimensional data, even if the number of variables is larger than the number of observations, principal components are obtained. In this paper, firstly 26 development agencies are evaluated with 19 variables by using principal component analysis based on classical and robust scatter matrices and then these 26 development agencies are evaluated with 46 variables by using the ROBPCA method.  相似文献   

18.
本文从信息化的发展入手,阐述了信息化建设及各地区信息化水平评价的重要性,采用多元统计中的主成分分析和主成分回归法,设计一个综合指标用以评估各类地区的信息化水平及其在全国所处的位置,指导各地区的信息化建设。  相似文献   

19.
When functional data are not homogenous, for example, when there are multiple classes of functional curves in the dataset, traditional estimation methods may fail. In this article, we propose a new estimation procedure for the mixture of Gaussian processes, to incorporate both functional and inhomogenous properties of the data. Our method can be viewed as a natural extension of high-dimensional normal mixtures. However, the key difference is that smoothed structures are imposed for both the mean and covariance functions. The model is shown to be identifiable, and can be estimated efficiently by a combination of the ideas from expectation-maximization (EM) algorithm, kernel regression, and functional principal component analysis. Our methodology is empirically justified by Monte Carlo simulations and illustrated by an analysis of a supermarket dataset.  相似文献   

20.
多指标综合评价中主成分分析和因子分析方法的比较   总被引:8,自引:0,他引:8  
文章通过对主成分分析和因子分析在研究目的、分析原理、SPSS软件实现过程方面的比较,指出在多指标综合评价时应用两种方法应该注意四个问题,以正确地进行实证研究。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号