Abstract: | In a cluster analysis of a multivariate data set, it may happen that one or two observations have a disproportionately large effect on the analysis, in the sense that their removal causes a dramatic change to the results. It is important to be able to identify such influential observations, and the present paper addresses this problem. To do so, we must first quantify the effect of a single observation. Various definitions are discussed, and criteria for identifying influential observations are investigated; the minimum spanning tree and the number of neighbours of each observation are considered. The investigation concentrates on single-link cluster analysis, although complete-link analysis is also briefly discussed. Patterns emerge in both real and simulated data, which suggest ways of predicting observations with no effect and those with the greatest effect. It is not necessary to recalculate the results with each observation omitted—an economy of presentation as well as labour. |