A comparative study of the <em>K</em>-means algorithm and the normal mixture model for clustering: Bivariate homoscedastic case期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A comparative study of the K-means algorithm and the normal mixture model for clustering: Bivariate homoscedastic case

Authors:	Dingxi Qiu

Institution:	Department of Industrial Engineering, University of Miami, Coral Gables, FL 33146, USA

Abstract:	The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.

Keywords:	Clustering Data mining Mixture model K-means algorithm EM algorithm Elongation Mixing proportion Misclassification rate
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏