A comparative study of the K-means algorithm and the normal mixture model for clustering: Bivariate homoscedastic case |
| |
Authors: | Dingxi Qiu |
| |
Institution: | Department of Industrial Engineering, University of Miami, Coral Gables, FL 33146, USA |
| |
Abstract: | The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance. |
| |
Keywords: | Clustering Data mining Mixture model K-means algorithm EM algorithm Elongation Mixing proportion Misclassification rate |
本文献已被 ScienceDirect 等数据库收录! |
|