首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Bayesian nonparametric clustering for large data sets
Authors:Zuanetti  Daiane Aparecida  Müller  Peter  Zhu  Yitan  Yang  Shengjie  Ji  Yuan
Institution:1.Departamento de Estatística, Universidade Federal de S?o Carlos, S?o Carlos, SP, Brazil
;2.Department of Mathematics, UT Austin, Austin, TX, USA
;3.NorthShore University HealthSystem, Evanston, IL, USA
;4.NorthShore University HealthSystem, Evanston and University of Chicago, Evanston, IL, USA
;
Abstract:

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号