首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken.  相似文献   

2.
Principal components are a well established tool in dimension reduction. The extension to principal curves allows for general smooth curves which pass through the middle of a multidimensional data cloud. In this paper local principal curves are introduced, which are based on the localization of principal component analysis. The proposed algorithm is able to identify closed curves as well as multiple curves which may or may not be connected. For the evaluation of the performance of principal curves as tool for data reduction a measure of coverage is suggested. By use of simulated and real data sets the approach is compared to various alternative concepts of principal curves.  相似文献   

3.
Empirical Bayes procedures have been developed extensively in the literature, under the assumption that the underlying parameter space (or the sample space) is Euclidean in nature. However, there has been almost no research carried out into when the data comes from a different space. We develop empirical Bayes techniques to estimate the mean direction of the Fisher-von Mises distribution. In this case, the underlying space is non-Euclidean. The special case when the data are angles on the unit circle is illustrated with an example.  相似文献   

4.
A crucial issue for principal components analysis (PCA) is to determine the number of principal components to capture the variability of usually high-dimensional data. In this article the dimension detection for PCA is formulated as a variable selection problem for regressions. The adaptive LASSO is used for the variable selection in this application. Simulations demonstrate that this approach is more accurate than existing methods in some cases and competitive in some others. The performance of this model is also illustrated using a real example.  相似文献   

5.
In this paper, a new method for robust principal component analysis (PCA) is proposed. PCA is a widely used tool for dimension reduction without substantial loss of information. However, the classical PCA is vulnerable to outliers due to its dependence on the empirical covariance matrix. To avoid such weakness, several alternative approaches based on robust scatter matrix were suggested. A popular choice is ROBPCA that combines projection pursuit ideas with robust covariance estimation via variance maximization criterion. Our approach is based on the fact that PCA can be formulated as a regression-type optimization problem, which is the main difference from the previous approaches. The proposed robust PCA is derived by substituting square loss function with a robust penalty function, Huber loss function. A practical algorithm is proposed in order to implement an optimization computation, and furthermore, convergence properties of the algorithm are investigated. Results from a simulation study and a real data example demonstrate the promising empirical properties of the proposed method.  相似文献   

6.
Exact influence measures are applied in the evaluation of a principal component decomposition for high dimensional data. Some data used for classifying samples of rice from their near infra-red transmission profiles, following a preliminary principal component analysis, are examined in detail. A normalization of eigenvalue influence statistics is proposed which ensures that measures reflect the relative orientations of observations, rather than their overall Euclidean distance from the sample mean. Thus, the analyst obtains more information from an analysis of eigenvalues than from approximate approaches to eigenvalue influence. This is particularly important for high dimensional data where a complete investigation of eigenvector perturbations may be cumbersome. The results are used to suggest a new class of influence measures based on ratios of Euclidean distances in orthogonal spaces.  相似文献   

7.
This paper reviews various treatments of non-metric variables in partial least squares (PLS) and principal component analysis (PCA) algorithms. The performance of different treatments is compared in an extensive simulation study under several typical data generating processes and associated recommendations are made. Moreover, we find that PLS-based methods are to prefer in practice, since, independent of the data generating process, PLS performs either as good as PCA or significantly outperforms it. As an application of PLS and PCA algorithms with non-metric variables we consider construction of a wealth index to predict household expenditures. Consistent with our simulation study, we find that a PLS-based wealth index with dummy coding outperforms PCA-based ones.  相似文献   

8.
One of the important topics in morphometry that received high attention recently is the longitudinal analysis of shape variation. According to Kendall's definition of shape, the shape of object appertains on non-Euclidean space, making the longitudinal study of configuration somehow difficult. However, to simplify this task, triangulation of the objects and then constructing a non-parametric regression-type model on the unit sphere is pursued in this paper. The prediction of the configurations in some time instances is done using both properties of triangulation and the size of great baselines. Moreover, minimizing a Euclidean risk function is proposed to select feasible weights in constructing smoother functions in a non-parametric smoothing manner. These will provide some proper shape growth models to analysis objects varying in time. The proposed models are applied to analysis of two real-life data sets.  相似文献   

9.
This paper presents a study on symmetry of repeated bi-phased data signals, in particular, on quantification of the deviation between the two parts of the signal. Three symmetry scores are defined using functional data techniques such as smoothing and registration. One score is related to the L 2-distance between the two parts of the signal, whereas the other two are constructed to specifically measure differences in amplitude and phase. Moreover, symmetry scores based on functional principal component analysis (PCA) are examined. The scores are applied to acceleration signals from a study on equine gait. The scores turn out to be highly associated with lameness, and their applicability for lameness quantification and detection is investigated. Four classification approaches turn out to give similar results. The scores describing amplitude and phase variation turn out to outperform the PCA scores when it comes to the classification of lameness.  相似文献   

10.
社会审计有助于提高公司治理水平,治理水平高的公司其运营的稳健性水平也高。以2003-2014年沪、深两市A股上市公司为样本,利用主成分分析法合成了企业稳健运营的综合性指标,并在审计意见的基础上构建了社会审计功能发挥的代理变量,考察社会审计对企业稳健运营的影响。研究发现,社会审计有助于提高企业运营的稳健性。通过替换指标和改变估计方法后重新回归,研究结果依然不变。  相似文献   

11.
Treating principal component analysis (PCA) and canonical variate analysis (CVA) as methods for approximating tables, we develop measures, collectively termed predictivity, that assess the quality of fit independently for each variable and for all dimensionalities. We illustrate their use with data from aircraft development, the African timber industry and copper froth measurements from the mining industry. Similar measures are described for assessing the predictivity associated with the individual samples (in the case of PCA and CVA) or group means (in the case of CVA). For these measures to be meaningful, certain essential orthogonality conditions must hold that are shown to be satisfied by predictivity.  相似文献   

12.
Differential analysis techniques are commonly used to offer scientists a dimension reduction procedure and an interpretable gateway to variable selection, especially when confronting high-dimensional genomic data. Huang et al. used a gene expression profile of breast cancer cell lines to identify genomic markers which are highly correlated with in vitro sensitivity of a drug Dasatinib. They considered three statistical methods to identify differentially expressed genes and finally used the results from the intersection. But the statistical methods that are used in the paper are not sufficient to select the genomic markers. In this paper we used three alternative statistical methods to select a combined list of genomic markers and compared the genes that were proposed by Huang et al. We then proposed to use sparse principal component analysis (Sparse PCA) to identify a final list of genomic markers. The Sparse PCA incorporates correlation into account among the genes and helps to draw a successful genomic markers discovery. We present a new and a small set of genomic markers to separate out the groups of patients effectively who are sensitive to the drug Dasatinib. The analysis procedure will also encourage scientists in identifying genomic markers that can help to separate out two groups.  相似文献   

13.
In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.  相似文献   

14.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

15.
In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed.  相似文献   

16.
Abstract. The first goal of this article is to consider influence analysis of principal Hessian directions (pHd) and highlight how such an analysis can provide valuable insight into its behaviour. Such insight includes reasons as to why pHd can sometimes return informative results when it is not expected to do so, and why many prefer a residuals‐based pHd method over its response‐based counterpart. The secondary goal of this article is to introduce a new influence measure applicable to many dimension reduction methods based on average squared canonical correlations. A general form of this measure is also given, allowing for application to dimension reduction methods other than pHd. A sample version of the measure is considered, with respect to pHd, with two example data sets.  相似文献   

17.
In this paper we shall establish a new matrix inequality which will fill the gap that there has not been any matrix Euclidean norm version of the Wielandt inequality in the literature yet. This inequality can be used to present an upper bound of a new measure of association which plays a very important role in statistics, especially in multivariate analysis. A new alternative based on Euclidean norm for relative gain of the covariance adjusted estimator of parameters is provided.  相似文献   

18.
Principal component analysis (PCA) and functional principal analysis are key tools in multivariate analysis, in particular modelling yield curves, but little attention is given to questions of uncertainty, neither in the components themselves nor in any derived quantities such as scores. Actuaries using PCA to model yield curves to assess interest rate risk for insurance companies are required to show any uncertainty in their calculations. Asymptotic results based on assumptions of multivariate normality are unsatisfactory for modest samples, and application of bootstrap methods is not straightforward, with the novel pitfalls of possible inversions in order of sample components and reversals of signs. We present methods for overcoming these difficulties and discuss arising of other potential hazards.  相似文献   

19.
A Bayes linear space is a linear space of equivalence classes of proportional σ‐finite measures, including probability measures. Measures are identified with their density functions. Addition is given by Bayes' rule and substraction by Radon–Nikodym derivatives. The present contribution shows the subspace of square‐log‐integrable densities to be a Hilbert space, which can include probability and infinite measures, measures on the whole real line or discrete measures. It extends the ideas from the Hilbert space of densities on a finite support towards Hilbert spaces on general measure spaces. It is also a generalisation of the Euclidean structure of the simplex, the sample space of random compositions. In this framework, basic notions of mathematical statistics get a simple algebraic interpretation. A key tool is the centred‐log‐ratio transformation, a generalization of that used in compositional data analysis, which maps the Hilbert space of measures into a subspace of square‐integrable functions. As a consequence of this structure, distances between densities, orthonormal bases, and Fourier series representing measures become available. As an application, Fourier series of normal distributions and distances between them are derived, and an example related to grain size distributions is presented. The geometry of the sample space of random compositions, known as Aitchison geometry of the simplex, is obtained as a particular case of the Hilbert space when the measures have discrete and finite support.  相似文献   

20.
基于非线性主成分和聚类分析的综合评价方法   总被引:1,自引:0,他引:1  
针对传统主成分在处理非线性问题上的不足,阐述了传统方法在数据无量纲化中“中心标准化”的缺点和处理“线性”数据时的缺陷,给出了数据无量纲化和处理“非线性”数据时的改进方法,并建立了一种基于“对数中心化”的非线性主成分分析和聚类分析的新的综合评价方法。实验表明,该方法能有效地处理非线性数据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号