首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Using balanced iterative reducing and clustering hierarchies to compute approximate rank statistics on massive datasets
Abstract:The balanced iterative reducing and clustering hierarchies (BIRCH) algorithm handles massive datasets by reading the data file only once, clustering the data as it is read, and retaining only a few clustering features to summarize the data read so far. Using BIRCH allows to analyse datasets that are too large to fit in the computer main memory. We propose estimates of Spearman's ρ and Kendall's τ that are calculated from a BIRCH output and assess their performance through Monte Carlo studies. The numerical results show that the BIRCH-based estimates can achieve the same efficiency as the usual estimates of ρ and τ while using only a fraction of the memory otherwise required.
Keywords:correlation  rank statistics  massive dataset  Kendall's τ  Spearman's ρ  BIRCH
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号