Studying Complexity of Model-based Clustering |
| |
Authors: | Semhar Michael |
| |
Affiliation: | Department of Information Systems, Statistics, and Management Science, The University of Alabama, Tuscaloosa, Alabama, USA |
| |
Abstract: | Cluster analysis is a popular statistics and computer science technique commonly used in various areas of research. In this article, we investigate factors that can influence clustering performance in the model-based clustering framework. The four factors considered are the level of overlap, number of clusters, number of dimensions, and sample size. Through a comprehensive simulation study, we investigate model-based clustering in different settings. As a measure of clustering performance, we employ three popular classification indices capable of reflecting the degree of agreement in two partitioning vectors, thus making the comparison between the true and estimated classification vectors possible. In addition to studying clustering complexity, the performance of the three classification measures is evaluated. |
| |
Keywords: | Adjusted Rand index CARP Clustering complexity Fowlkes and Mallows index Model-based clustering Overlap |
|
|