期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

黄恒君《统计与信息论坛》2013,28(9):3-8

随着大数据时代的来临,近年来函数型数据分析方法成为研究的热点问题,针对曲线的聚类分析方法引起了学界的关注.给出一种曲线聚类的方法:以L2距离作为亲疏程度的度量,在B样条基底函数展开表述下,将曲线本身信息、曲线变化信息引入聚类算法构建,并实现了曲线聚类与传统多元统计聚类方法的对接.作为应用,以城乡收入函数聚类实例验证了该曲线聚类方法,结果表明,在引入曲线变化信息的情况下,比仅考虑曲线本身信息能够取得更好的聚类效果. 相似文献

2.

基于高维分步投影的多重分区聚类算法

李毅姜天英刘亚茹《统计与信息论坛》2017,(2):11-17

基于数据分布密度划分的聚类算法是数据挖掘聚类算法中的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷,设计出高维分步投影的多重分区聚类算法;以高维分布投影密度为依据,对数据集进行多重分区产生数据集的子簇空间,并进行子簇合并形成了理想的聚类结果;依据算法进行实验,结果证明该算法具有运算简单和运行效率高等优良性。相似文献

3.

基于高维分步投影的多重分区聚类算法

张维群陈文浩《统计与信息论坛》2017,(2):18-22

数据分布密度划分的聚类算法是数据挖掘聚类算法的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷，设计高维分步投影的多重分区聚类算法；以高维分布投影密度为依据，对数据集进行多重分区，产生数据集的子簇空间，并进行子簇合并，形成理想的聚类结果；依据该算法进行实验，结果证明该算法具有运算简单和运行效率高等优良性。相似文献

4.

基于广度优先搜索的变异加权模糊C-均值聚类算法

翟丽丽张影王京《统计与决策》2016,(15):9-14

针对传统模糊C-均值聚类方法(fuzzy C-means,简称FCM)对初始值敏感导致的易陷入局部最优和噪声敏感问题,文章提出一种基于广度优先搜索的变异加权模糊C-均值聚类算法.该算法通过改进具有全局搜索能力的广度优先搜索算法(Breadth Fist Search,BFS)和有效聚类评价函数相结合,确定了接近真实的初始聚类中心,同时能够剔除噪声数据.在此基础上考虑属性噪声对聚类结果的影响问题,引入变异系数赋权法对FCM的目标函数进行改进,进一步提高了FCM算法的抗噪性.实验结果表明,该算法能够有效的克服传统FCM的不足,与其他聚类算法相比,具有较快的收敛速度、更好的聚类准确率及较高的抗噪性. 相似文献

5.

多指标面板数据聚类方法及其应用

任娟《统计与决策》2012,(4):92-95

文章针对多指标面板数据的样品分类问题,从多元统计学理论角度提出一个多指标面板数据的聚类分析方法。该方法综合考虑面板数据的水平指标、增量指标和增量变化率指标的时间序列特征及其非同步时间序列问题,在重新构造了离差平方和函数基础上,提出了一种聚类方法。通过实证分析,表明新方法能够解决多指标面板数据聚类的问题,分类效果较好。相似文献

6.

一种改进的K-Prototypes聚类算法

吴孟书吴喜之《统计与决策》2008,(5):24-26

传统的K-Prototypes聚类算法是利用划分的思想来对混合数据进行聚类,但是当混合数据的维度增大时,对象之间的差异度几乎相等,使得此算法难以进行。针对上述缺陷,文章提出一种改进的K-Prototyes聚类算法,聚类前先剔除各类中不相关的维度,将高维混合数据投影降维后再进行聚类。文中给出了Heart Disease Databases的算例,验证了算法的有效性。相似文献

7.

众包竞赛的离群点欺诈用户检测算法研究

《统计与信息论坛》2019,(10):20-26

针对基于众包竞赛中欺诈者筛除机制的黄金标准数据方法、聚类算法的离群点检测算法K-means-算法和DBSCAN算法,依赖于事先给定的参数,不适合大规模数据集检测的问题,提出基于样本连通图的离群点检测算法。首先,给定参数并重复调用离群点检测算法,识别数据中的离群点和聚类;其次,计算每两个样本之间的连接次数和连接强度,在给定连接强度下界δ的情况下,根据样本的连接强度来构造样本之间的连通图;最后,根据样本之间的连通情况,对样本进行标记,把样本标记为聚类节点和离群点。实验结果表明,该算法在放宽参数设置范围的情况下,缩小了离群点个数波动范围,提升了离群点识别准确率,优于对比算法和经典的黄金标准数据方法。相似文献

8.

新型模糊时间序列模型与应用

许之友吕恕《统计与决策》2016,(13):76-78

为了提高模糊时间序列的预测精度,文章利用小波分析多尺度分解方法,选择适当的小波函数,把一维数据分解为低频逼近部分和高频细节部分,在低频部分和高频部分根据各自数据特征利用模糊C一均值聚类算法分别建立模糊时间序列模型并预测,然后把每个部分的预测值根据小波重构得到最终预测结果.通过对国家财政收入实例验证对比发现,该模型在预测精度方面有较大提高. 相似文献

9.

一种改进的Q型因子聚类法

余立凡殷瑞飞《统计与决策》2008,(6):35-37

文章借助对应分析的基本思路实现了对Q型因子分析算法上的改进,得到了一种新的能够处理海量数据的聚类方法。通过算法分析,该方法的时间复杂度为样本容量的线性阶,这充分体现了其在算法效率上的优越性。最后,将该方法应用于上市公司板块分析中,并取得了较好的效果。相似文献

10.

面板数据聚类分析的投影寻踪模型

徐华锋方志耕《统计与决策》2010,(4)

文章针对面板数据的聚类问题的高维复杂性,利用线性投影技术将其转换为关于投影特征向量的线性聚类问题;从而实现在低维空间对高维数据样本的聚类分析。最后实证分析验证了面板数据聚类分析的投影寻踪模型的可行性与有效性。相似文献

11.

A new algorithm for clustering based on kernel density estimation

L. C. Matioli S.R. Santos M. Kleina E. A. Leite 《Journal of applied statistics》2018,45(2):347-366

In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data. 相似文献

12.

基于自适应迭代更新的函数型数据聚类方法研究

王德青等《统计研究》2015,32(4):91-96

函数型数据的稀疏性和无穷维特性使得传统聚类分析失效。针对此问题,本文在界定函数型数据概念与内涵的基础上提出了一种自适应迭代更新聚类分析。首先,基于数据参数信息实现无穷维函数空间向有限维多元空间的过渡;在此基础上,依据变量信息含量的差异构建了自适应赋权聚类统计量,并依此为函数型数据的相似性测度进行初始类别划分;进一步地,在给定阈值限制下,对所有函数的初始类别归属进行自适应迭代更新,将收敛的优化结果作为最终的类别划分。随机模拟和实证检验表明,与现有的同类函数型聚类分析相比,文中方法的分类正确率显著提高,体现了新方法的相对优良性和实际问题应用中的有效性。相似文献

13.

Clustering the Constitutive Elements of Measuring Tables Data Structure

Amar Rebbouh 《统计学通讯:模拟与计算》2013,42(3):751-763

This article deals with the clustering of the elements of a structure of juxtaposition of data measuring tables. One of the main issues in such problems is the selection of a one-dimensional quantity to represent the information included in the repeated observations of each variable. We propose the use of three different indices to measure the distance between elements of a structure and use the last one based on the Hilbert–Schmidt inner product for clustering purposes through an algorithmic procedure. The proposed algorithm is applied for clustering the customers of an electric company where each customer is described by a curve of load. 相似文献

14.

Spatial variability clustering for spatially dependent functional data

Elvira Romano Antonio Balzanella Rosanna Verde 《Statistics and Computing》2017,27(3):645-658

This paper introduces a method for clustering spatially dependent functional data. The idea is to consider the contribution of each curve to the spatial variability. Thus, we define a spatial dispersion function associated to each curve and perform a k-means like clustering algorithm. The algorithm is based on the optimization of a fitting criterion between the spatial dispersion functions associated to each curve and the representative of the clusters. The performance of the proposed method is illustrated by an application on real data and a simulation study. 相似文献

15.

A probabilistic clustering method for data elements with normal distributed attributes

Mohammad Hasan Bakhtiarifar 《统计学通讯:模拟与计算》2017,46(4):2563-2575

By considering uncertainty in the attributes common methods cannot be applicable in data clustering. In the recent years, many researches have been done by considering fuzzy concepts to interpolate the uncertainty. But when data elements attributes have probabilistic distributions, the uncertainty cannot be interpreted by fuzzy theory. In this article, a new concept for clustering of elements with predefined probabilistic distributions for their attributes has been proposed, so each observation will be as a member of a cluster with special probability. Two metaheuristic algorithms have been applied to deal with the problem. Squared Euclidean distance type has been considered to calculate the similarity of data elements to cluster centers. The sensitivity analysis shows that the proposed approach will converge to the classic approaches results when the variance of each point tends to be zero. Moreover, numerical analysis confirms that the proposed approach is efficient in clustering of probabilistic data. 相似文献

16.

A new partitioning around medoids algorithm

《Journal of Statistical Computation and Simulation》2012,82(8):575-584

Kaufman and Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this paper, we propose to partition around medoids by maximizing a criteria "Average Silhouette" defined by Kaufman and Rousseeuw (1990). We also propose a fast-to-compute approximation of "Average Silhouette". We implement these two new partitioning around medoids algorithms and illustrate their performance relative to existing partitioning methods in simulations. 相似文献

17.

An automatic clustering algorithm for probability density functions

《Journal of Statistical Computation and Simulation》2012,82(15):3047-3063

We propose an intuitive and computationally simple algorithm for clustering the probability density functions (pdfs). A data-driven learning mechanism is incorporated in the algorithm in order to determine the suitable widths of the clusters. The clustering results prove that the proposed algorithm is able to automatically group the pdfs and provide the optimal cluster number without any a priori information. The performance study also shows that the proposed algorithm is more efficient than existing ones. In addition, the clustering can serve as the intermediate compression tool in content-based multimedia retrieval that we apply the proposed algorithm to categorize a subset of COREL image database. And the clustering results indicate that the proposed algorithm performs well in colour image categorization. 相似文献

18.

Cluster Data Streams with Noisy Variables

Hu Yang Chenqun Yu 《统计学通讯:模拟与计算》2016,45(4):1381-1396

Clustering algorithms are important methods widely used in mining data streams because of their abilities to deal with infinite data flows. Although these algorithms perform well to mining latent relationship in data streams, most of them suffer from loss of cluster purity and become unstable when the inputting data streams have too many noisy variables. In this article, we propose a clustering algorithm to cluster data streams with noisy variables. The result from simulation shows that our proposal method is better than previous studies by adding a process of variable selection as a component in clustering algorithms. The results of two experiments indicate that clustering data streams with the process of variable selection are more stable and have better purity than those without such process. Another experiment testing KDD-CUP99 dataset also shows that our algorithm can generate more stable result. 相似文献

19.

Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves

Tarpey T 《The American statistician》2007,61(1):34-40

Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration. 相似文献