A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

Authors:	E F Saraiva C A B Pereira A K Suzuki

Institution:	1. Mathematics Institute, Federal University of Mato Grosso do Sul, Campo Grande, Brazil;2. Institute of Mathematics and Statistics, University of S?o Paulo, S?o Paulo, Brazil;3. Sciences Institute of Mathematics and Computers, University of S?o Paulo, S?o Carlos, Brazil

Abstract:	In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets.

Keywords:	Mixture model Bayesian approach Gibbs sampling Metropolis–Hastings split-merge update Kullback–Leibler divergence

设为首页 | 免责声明 | 关于勤云 | 加入收藏