The Clustering of Categorical Data: A Comparison of a Model-based and a Distance-based Approach |
| |
Authors: | Laura Anderlucci Christian Hennig |
| |
Institution: | 1. Department of Statistical Sciences , University of Bologna , Bologna , Italy laura.anderlucci@unibo.it;3. Department of Statistical Science , University College London , London , UK |
| |
Abstract: | For clustering multivariate categorical data, a latent class model-based approach (LCC) with local independence is compared with a distance-based approach, namely partitioning around medoids (PAM). A comprehensive simulation study was evaluated by both a model-based as well as a distance-based criterion. LCC was better according to the model-based criterion and PAM was sometimes better according to the distance-based criterion. However, LCC had an overall good and sometimes better distance-based performance as PAM, although this was not the case in a real data set on tribal art items. |
| |
Keywords: | Adjusted Rand index Average Silhouette width Latent class clustering Partitioning around medoids Tribal art |
|
|