首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling
Authors:E F Saraiva  C A B Pereira  A K Suzuki
Institution:1. Mathematics Institute, Federal University of Mato Grosso do Sul, Campo Grande, Brazil;2. Institute of Mathematics and Statistics, University of S?o Paulo, S?o Paulo, Brazil;3. Sciences Institute of Mathematics and Computers, University of S?o Paulo, S?o Carlos, Brazil
Abstract:In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets.
Keywords:Mixture model  Bayesian approach  Gibbs sampling  Metropolis–Hastings  split-merge update  Kullback–Leibler divergence
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号