首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 139 毫秒
1.
王星  马璇 《统计研究》2015,32(10):74-81
文章旨在研究受航空业动态定价机制影响下的机票价格序列变点估计模型,文中分析了机票价格u8序列数据的结构特点,提出了可用于高噪声数据环境下、阶梯状、带明显多变点的多阶段序列变点估计框架,该框架中级联组合了DBSCAN算法、EM-高斯混合模型聚类、凝聚层次聚类算法和基于乘积划分模型的变点估计方法等多种成熟的数据分析方法,通过对“北京-昆明”航线航班的实证分析,验证了数据分析框架的有效性和普遍适用性。  相似文献   

2.
针对基于众包竞赛中欺诈者筛除机制的黄金标准数据方法、聚类算法的离群点检测算法K-means-算法和DBSCAN算法,依赖于事先给定的参数,不适合大规模数据集检测的问题,提出基于样本连通图的离群点检测算法。首先,给定参数并重复调用离群点检测算法,识别数据中的离群点和聚类;其次,计算每两个样本之间的连接次数和连接强度,在给定连接强度下界δ的情况下,根据样本的连接强度来构造样本之间的连通图;最后,根据样本之间的连通情况,对样本进行标记,把样本标记为聚类节点和离群点。实验结果表明,该算法在放宽参数设置范围的情况下,缩小了离群点个数波动范围,提升了离群点识别准确率,优于对比算法和经典的黄金标准数据方法。  相似文献   

3.
针对传统模糊C-均值聚类方法(fuzzy C-means,简称FCM)对初始值敏感导致的易陷入局部最优和噪声敏感问题,文章提出一种基于广度优先搜索的变异加权模糊C-均值聚类算法.该算法通过改进具有全局搜索能力的广度优先搜索算法(Breadth Fist Search,BFS)和有效聚类评价函数相结合,确定了接近真实的初始聚类中心,同时能够剔除噪声数据.在此基础上考虑属性噪声对聚类结果的影响问题,引入变异系数赋权法对FCM的目标函数进行改进,进一步提高了FCM算法的抗噪性.实验结果表明,该算法能够有效的克服传统FCM的不足,与其他聚类算法相比,具有较快的收敛速度、更好的聚类准确率及较高的抗噪性.  相似文献   

4.
基于数据分布密度划分的聚类算法是数据挖掘聚类算法中的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷,设计出高维分步投影的多重分区聚类算法;以高维分布投影密度为依据,对数据集进行多重分区产生数据集的子簇空间,并进行子簇合并形成了理想的聚类结果;依据算法进行实验,结果证明该算法具有运算简单和运行效率高等优良性。  相似文献   

5.
文章回顾了经典的K-means算法,分析了其存在的两个突出缺点:无法自行确定聚类数k和对初始聚类中心点十分敏感.受光电效应实验中电子束在反向电场中的串行规律启发,提出了基于捕获流动中心试点的自适应确定聚类数目的K-means算法,该算法模拟电子束在异性电子云中的串行,令数据点簇捕获流动的聚类中心试点,来消除多余的初始聚类中心,从而达到解决K-means算法的存在的缺陷问题.实验表明,该算法具有很强的自行确定聚类数的能力,也大大降低了对初始聚类中心选择的敏感度.  相似文献   

6.
传统的K-Prototypes聚类算法是利用划分的思想来对混合数据进行聚类,但是当混合数据的维度增大时,对象之间的差异度几乎相等,使得此算法难以进行。针对上述缺陷,文章提出一种改进的K-Prototyes聚类算法,聚类前先剔除各类中不相关的维度,将高维混合数据投影降维后再进行聚类。文中给出了Heart Disease Databases的算例,验证了算法的有效性。  相似文献   

7.
数据分布密度划分的聚类算法是数据挖掘聚类算法的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷,设计高维分步投影的多重分区聚类算法;以高维分布投影密度为依据,对数据集进行多重分区,产生数据集的子簇空间,并进行子簇合并,形成理想的聚类结果;依据该算法进行实验,结果证明该算法具有运算简单和运行效率高等优良性。  相似文献   

8.
支持向量机是在两分类的基础上发展起来的,如何将两分类成果推广到多分类中是支持向量机的一个重要问题.文章在聚类分类的基础上根据二叉树思想,提出了一种新的聚类算法来进行多分类.此方法充分利用二叉树中分两类的简便之处,将多类的聚类简化为点的聚类,从而避免了以往聚类方法中可能出现的同一类的点在聚类中变成不同类的问题,并结合选址问题中固定数目的配送点的选址算法,将原问题进行简化,对多分类问题提出了新的聚类算法.  相似文献   

9.
基于遗传算法的投影寻踪聚类   总被引:1,自引:0,他引:1  
传统的投影寻踪聚类算法PROCLUS是一种有效的处理高维数据聚类的算法,但此算法是利用爬山法(Hill climbing)对各类中心点进行循环迭代、选取最优的过程,由于爬山法是一种局部搜索(local search)方法,得到的最优解可能仅仅是局部最优。针对上述缺陷,提出一种改进的投影寻踪聚类算法,即利用遗传算法(Genetic Algorithm)对各类中心点进行循环迭代,寻找到全局最优解。仿真实验结果证明了新算法的可行性和有效性。  相似文献   

10.
文章借助对应分析的基本思路实现了对Q型因子分析算法上的改进,得到了一种新的能够处理海量数据的聚类方法。通过算法分析,该方法的时间复杂度为样本容量的线性阶,这充分体现了其在算法效率上的优越性。最后,将该方法应用于上市公司板块分析中,并取得了较好的效果。  相似文献   

11.
Spectral clustering uses eigenvectors of the Laplacian of the similarity matrix. It is convenient to solve binary clustering problems. When applied to multi-way clustering, either the binary spectral clustering is recursively applied or an embedding to spectral space is done and some other methods, such as K-means clustering, are used to cluster the points. Here we propose and study a K-way clustering algorithm – spectral modular transformation, based on the fact that the graph Laplacian has an equivalent representation, which has a diagonal modular structure. The method first transforms the original similarity matrix into a new one, which is nearly disconnected and reveals a cluster structure clearly, then we apply linearized cluster assignment algorithm to split the clusters. In this way, we can find some samples for each cluster recursively using the divide and conquer method. To get the overall clustering results, we apply the cluster assignment obtained in the previous step as the initialization of multiplicative update method for spectral clustering. Examples show that our method outperforms spectral clustering using other initializations.  相似文献   

12.
高海燕等 《统计研究》2020,37(8):91-103
函数型聚类分析算法涉及投影和聚类两个基本要素。通常,最优投影结果未必能够有效地保留类别信息,从而影响后续聚类效果。为此,本文梳理了函数型聚类的构成要素及运行过程;借助非负矩阵分解的聚类特性,提出了基于非负矩阵分解的函数型聚类算法,构建了“投影与聚类”并行的实现框架,并采用交替迭代方法更新求解,分析了算法的计算时间复杂度。针对随机模拟数据验证和语音识别数据的实例检验结果显示,该函数型聚类算法有助于提高聚类效果;针对北京市二氧化氮(NO2)污染物小时浓度数据的实例应用表明,该函数型聚类算法对空气质量监测点类型的区分能够充分识别站点布局的空间模式,具有良好的实际应用价值。  相似文献   

13.

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

  相似文献   

14.
本文研究的是时间序列的聚类问题。由于现实世界中时间序列多数是非线性的,而现有的时间序列聚类问题大都是基于线性时间序列模型进行聚类的,本文提出了可以用于非线性时间序列的聚类方法。以时间序列的二维核密度估计之间的相似性作为非线性时间序列的距离度量,该距离度量方式是一种非参数的距离度量方法,考虑到了时间序列自相关结构的差异,能够粗糙地识别时间序列形状和动态相关结构的相似性。与理论研究结果相一致,我们的模拟实验结果也验证了这种距离度量的有效性。  相似文献   

15.
对于一类变量非线性相关的面板数据,现有的基于线性算法的面板数据聚类方法并不能准确地度量样本间的相似性,且聚类结果的可解释性低。综合考虑变量非线性相关问题及聚类结果可解释性问题,提出一种非线性面板数据的聚类方法,通过非线性核主成分算法实现对样本相似性的测度,并基于混合高斯模型进行样本概率聚类,实证表明该方法的有效性及其对聚类结果的可解释性有所提高。  相似文献   

16.
In this paper, we present a new algorithm for clustering proximity-relation matrix that does not require the transitivity property. The proposed algorithm is first inspired by the idea of Yang and Wu [16] then turned into a self-organizing process that is built upon the intuition behind clustering. At the end of the process subjects belonging to be the same cluster should converge to the same point, which represents the cluster center. However, the performance of Yang and Wu's algorithm depends on parameter selection. In this paper, we use the partition entropy (PE) index to choose it. Numerical result illustrates that the proposed method does not only solve the parameter selection problem but also obtains an optimal clustering result. Finally, we apply the proposed algorithm to three applications. One is to evaluate the performance of higher education in Taiwan, another is machine–parts grouping in cellular manufacturing systems, and the other is to cluster probability density functions.  相似文献   

17.
Two symmetric fractional factorial designs with qualitative and quantitative factors are equivalent if the design matrix of one can be obtained from the design matrix of the other by row and column permutations, relabeling of the levels of the qualitative factors and reversal of the levels of the quantitative factors. In this paper, necessary and sufficient methods of determining equivalence of any two symmetric designs with both types of factors are given. An algorithm used to check equivalence or non-equivalence is evaluated. If two designs are equivalent the algorithm gives a set of permutations which map one design to the other. Fast screening methods for non-equivalence are considered. Extensions of results to asymmetric fractional factorial designs with qualitative and quantitative factors are discussed.  相似文献   

18.
Social networking sites (SNSs) make it possible to connect people and they can communicate with others. Due to the lack of privacy mechanisms, the users in SNSs are vulnerable to some kinds of attacks. Security and privacy issues have become critically important with the fast expansion of SNSs. Most network applications such as pervasive computing, grid computing and P2P networks can be viewed as multi-agent systems which are open, anonymous and dynamic in nature. Moreover, most of the existing reputation trust models (RTMs) do not depend on any clustering structures. The clustering structures are used to effectively calculate the trustworthiness of the network nodes. In this paper, a novel cosine similarity-based clustering and dynamic reputation trust aware key generation (CSBC-DRT) scheme is proposed. For better faced clustering, a cosine similarity measure is estimated for all the nodes on the network. Based on the similarity measure among the nodes, the network nodes are clustered into disjoint groups. The RTM is built in this proposed scheme. Here, an improved MD5 algorithm is explored for key generation and key verification. After the key verification, the trusted measures such as reputation value, positive edge and negative edge values are computed to formulate the trusted network. The proposed scheme performs better than the existing RTM, which provides trusted communication in social networks.  相似文献   

19.
Often, categorical ordinal data are clustered using a well-defined similarity measure for this kind of data and then using a clustering algorithm not specifically developed for them. The aim of this article is to introduce a new clustering method suitably planned for ordinal data. Objects are grouped using a multinomial model, a cluster tree and a pruning strategy. Two types of pruning are analyzed through simulations. The proposed method allows to overcome two typical problems of cluster analysis: the choice of the number of groups and the scale invariance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号