Bayesian nonparametric clustering for large data sets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Bayesian nonparametric clustering for large data sets

Authors:	Zuanetti Daiane Aparecida Müller Peter Zhu Yitan Yang Shengjie Ji Yuan

Institution:	1.Departamento de Estatística, Universidade Federal de S?o Carlos, S?o Carlos, SP, Brazil ;2.Department of Mathematics, UT Austin, Austin, TX, USA ;3.NorthShore University HealthSystem, Evanston, IL, USA ;4.NorthShore University HealthSystem, Evanston and University of Chicago, Evanston, IL, USA ;

Abstract:	We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏