Estimating the number of clusters in a data set via the gap statistic |
| |
Authors: | Robert Tibshirani Guenther Walther & Trevor Hastie |
| |
Institution: | Stanford University, USA |
| |
Abstract: | We propose a method (the 'gap statistic') for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K -means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature. |
| |
Keywords: | Clustering Groups Hierarchy Uniform distribution |
|
|