首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 788 毫秒
1.
一种加权主成分距离的聚类分析方法   总被引:1,自引:0,他引:1  
吕岩威  李平 《统计研究》2016,33(11):102-108
指标之间的高度相关性及其重要性差异导致了传统聚类分析方法往往无法获得良好的分类效果。本文在对传统聚类分析方法及其各种改进方法局限性展开探讨的基础上,运用数学方法重构了分类定义中的距离概念,通过定义自适应赋权的主成分距离为分类统计量,提出一种新的改进的主成分聚类分析方法——加权主成分距离聚类分析法。理论研究表明,加权主成分距离聚类分析法系统集成了已有聚类分析方法的优点,有充分的理论基础保证其科学合理性。仿真实验结果显示,加权主成分距离聚类分析法能够有效解决已有聚类分析方法在特定情形下的失真问题,所得分类效果更为理想。  相似文献   

2.
文章针对主成分综合评价主要环节的一般性问题展开讨论,给出可行的解决方案并进行了理论分析。在总结现有关于主成分聚类分析重要文献的基础上,通过构建客观赋权的加权主成分距离为聚类统计量,有效地解决了现有聚类模型不能处理指标共线性和重要性差异悬殊的问题。对比本文拓展的聚类模型与同类模型的分类效率发现,加权主成分聚类分析蕴含的客观合理性是其优势所在的根本原因。  相似文献   

3.
针对传统主成分分析在处理非线性问题上的不足,文章阐述了应用核主成分分析进行数据处理的改进方法,并介绍了一种基于核主成分的加权聚类分析的综合评价方法.实验表明,该方法可以改进传统的综合评价方法.  相似文献   

4.
主成分聚类分析法的案例教学方法   总被引:6,自引:0,他引:6  
张虎 《统计与决策》2007,(20):163-164
本文针对《多元统计分析》课程中主成分分析和聚类分析法的教学难点,采用案例教学法来解释主成分分析及聚类分析,并依据主成分得分对样品进行系统聚类。然后按照第一主成分的得分将样本再排序,与传统的综合得分的排序进行比较。  相似文献   

5.
在全球气候危机下,发展旅游景区低碳经济是世界经济运行的方向.文章首先分析并构建了低碳旅游景区评价指标,接着将多元统计分析中的主成分分析和模糊聚类分析相结合,利用主成分分析对评价指标进行降维处理,并通过对主成分进行模糊聚类将待评价低碳旅游景区进行分类,并按照类间第一主成分均值来对类进行排序,对同一类内景区按照第一主成分大小排序实现类内排序,从而实现对整个低碳旅游景区的排序.最后实例分析了该方法的可行性和科学性.  相似文献   

6.
社会和谐程度的统计分析与评价   总被引:1,自引:0,他引:1  
本文从定性角度选取若干个指标,并着重以某一年的数据为例,从定量角度对社会和谐程度进行统计分析(聚类分析,主成分分析),并通过主成分分析得出定量的评价函数,同时得出如何利用评价函数度量各区域的社会和谐程度的统计方法。  相似文献   

7.
本文运用聚类分析法和主成分分析法,以1994年全国各省市商品价格分类指数为指标,对各地区间的价格指数作了聚类分析,并对各地区零售商品价格指数进行了主成分分析,找出引起物价地区差异的八个主要因素、对各因素进行了分析比较和综合评价  相似文献   

8.
面板数据的有序聚类分析是多元统计分析的新兴研究领域。借鉴多元统计学中主成分分析方法对面板数据在时间变量上进行降维处理,把变异信息的损失降低到最小,较为准确地反映了样本在各时间段内的整体变化水平;采用费希尔最优求解算法对主成分得分进行有序聚类,为研究有序面板数据的亲疏关系提供一些思路;对全球气候变化进行聚类分析,分析五十年来全球及区域气候变化特点,与国外研究结论对比,显示出良好的应用性。  相似文献   

9.
城市投资潜力分析是对城市投资与发展状况的客观、综合的评价。通过构建城市投资潜力指标体系作为评价的标准,在主成分分析的基础上,采用最短距离法和类平均法两种聚类分析方法进行对比,既反映了广东21个主要城市的投资潜力,也找出了投资影响因素相近的城市群及其投资热点。采用提取主成分作为R型聚类分析指标的方法,也弥补了以往研究中只做排名的不足,为政府制定发展策略以及为投资者明确投资方向提供参考。  相似文献   

10.
经济增长不仅需要速度,更要注重效益和质量.本文运用主成分分析方法对中部6个省会城市2005年经济增长质量进行统计分析,提取4个主成分,然后对中部6个省会城市的4个主成分得分和综合得分进行排序,并通过聚类分析对排序合理性予以检验,最后给出有关研究结论并提出若干建议.  相似文献   

11.
Principal component analysis is a popular dimension reduction technique often used to visualize high‐dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high dimensionality will cause biased or inconsistent eigenvector estimates, but in practice, the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high‐dimensional data, even when there are few observations compared with the number of variables. Our argument is twofold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus, the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high‐dimensional situations.  相似文献   

12.
在聚类问题中,若变量之间存在相关性,传统的应对方法主要是考虑采用马氏距离、主成分聚类等方法,但其可操作性或可解释性较差,因此提出一类基于模型的聚类方法,先对变量间的相关性结构建模(作为辅助信息)再做聚类分析。这种方法的优点主要在于:适用范围更宽泛,不仅能处理(线性)相关问题,而且还可以处理变量间存在的其他复杂结构生成的数据聚类问题;各个变量的重要性也可以通过模型的回归系数来体现;比马氏距离更稳健、更具操作性,比主成分聚类更容易得到解释,算法上也更为简洁有效。  相似文献   

13.
基于非线性主成分和聚类分析的综合评价方法   总被引:1,自引:0,他引:1  
针对传统主成分在处理非线性问题上的不足,阐述了传统方法在数据无量纲化中“中心标准化”的缺点和处理“线性”数据时的缺陷,给出了数据无量纲化和处理“非线性”数据时的改进方法,并建立了一种基于“对数中心化”的非线性主成分分析和聚类分析的新的综合评价方法。实验表明,该方法能有效地处理非线性数据。  相似文献   

14.
In this article, we consider clustering based on principal component analysis (PCA) for high-dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high-dimensional data. First, we derive a geometric representation of high-dimension, low-sample-size (HDLSS) data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets.  相似文献   

15.
Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA.  相似文献   

16.
Principal component and correspondence analysis can both be used as exploratory methods for representing multivariate data in two dimensions. Circumstances under which the, possibly inappropriate, application of principal components to untransformed compositional data approximates to a correspondence analysis of the raw data are noted. Aitchison (1986) has proposed a method for the principal component analysis of compositional data involving transformation of the raw data. It is shown how this can be approximated by a correspondence analysis of appropriately transformed data. The latter approach may be preferable when there are zeroes in the data.  相似文献   

17.
Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.  相似文献   

18.
主成分与因子分析中指标同趋势化方法探讨   总被引:9,自引:0,他引:9  
样本主成分和样本因子分析法已成为一种最主要的综合评价方法之一,指标变量的同趋势化是运用该方法的重要步骤。文章总结了主成分与因子分析中指标同趋势化的具体方法,论述了这些方法对综合评价的影响,并指出了这些方法的适用条件。  相似文献   

19.
Common factor analysis (CFA) and principal component analysis (PCA) are widely used multivariate techniques. Using simulations, we compared CFA with PCA loadings for distortions of a perfect cluster configuration. Results showed that nonzero PCA loadings were higher and more stable than nonzero CFA loadings. Compared to CFA loadings, PCA loadings correlated weakly with the true factor loadings for underextraction, overextraction, and heterogeneous loadings within factors. The pattern of differences between CFA and PCA was consistent across sample sizes, levels of loadings, principal axis factoring versus maximum likelihood factor analysis, and blind versus target rotation.  相似文献   

20.
A set of \(n\) -principal points of a \(p\) -dimensional distribution is an optimal \(n\) -point-approximation of the distribution in terms of a squared error loss. It is in general difficult to derive an explicit expression of principal points. Hence, we may have to search the whole space \(R^p\) for \(n\) -principal points. Many efforts have been devoted to establish results that specify a linear subspace in which principal points lie. However, the previous studies focused on elliptically symmetric distributions and location mixtures of spherically symmetric distributions, which may not be suitable to many practical situations. In this paper, we deal with a mixture of elliptically symmetric distributions that form an allometric extension model, which has been widely used in the context of principal component analysis. We give conditions under which principal points lie in the linear subspace spanned by the first several principal components.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号