首页 | 本学科首页   官方微博 | 高级检索  
     


Effects of some design factors on the distribution of similarity indices in cluster analysis
Authors:Ahmed N. Albatineh  Hafiz M. R. Khan  Bashar Zogheib  Golam B. M. Kibria
Affiliation:1. Department of Community Medicine and Behavioral Sciences, Faculty of Medicine, Kuwait University, Safat, Kuwait;2. Department of Public Health, Texas Tech University Health Sciences Center, Lubbock, TX, USA;3. Department of Mathematics and Natural Sciences, American University of Kuwait, Kuwait;4. Department of Statistics, Florida International University, Miami, FL, USA
Abstract:This article investigates the effects of number of clusters, cluster size, and correction for chance agreement on the distribution of two similarity indices, namely, Jaccard and Rand indices. Skewness and kurtosis are calculated for the two indices and their corrected forms then compared with those of the normal distribution. Three clustering algorithms are implemented: complete linkage, Ward, and K-means. Data were randomly generated from bivariate normal distributions with specified means and variance covariance matrices. Three-way ANOVA is performed to assess the significance of the design factors using skewness and kurtosis of the indices as responses. Test statistics for testing skewness and kurtosis and observed power are calculated. Simulation results showed that independent of the clustering algorithms or the similarity indices used, the interaction effect cluster size x number of clusters and the main effects of cluster size and number of clusters were found always significant for skewness and kurtosis. The three way interaction of cluster size x correction x number of clusters was significant for skewness of Rand and Jaccard indices using all clustering algorithms, but was not significant using Ward's method for both Rand and Jaccard indices, while significant for Jaccard only using complete linkage and K-means algorithms. The correction for chance agreement was significant for skewness and kurtosis using Rand and Jaccard indices when complete linkage method is used. Hence, such design factors must be taken into consideration when studying distribution of such indices.
Keywords:Correction for Chance Agreement  Distribution  Jaccard  Rand  Similarity indices  Simulations
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号