Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data |
| |
Authors: | Ju-Hyun Park Minjung Kyung |
| |
Institution: | 1. Department of Statistics, Dongguk University, Seoul 04620, South Korea;2. Department of Statistics, Duksung Women’s University, Seoul 01369, South Korea |
| |
Abstract: | In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC) posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009) is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999). |
| |
Keywords: | Corresponding author primary 62G08 secondary 62P10 Temporal cyclic gene expression profiles Dirichlet process mixture Fourier series Variable selection Label-switching Adjusted Rand index |
本文献已被 ScienceDirect 等数据库收录! |
|