Clustering transformed compositional data using K-means,with applications in gene expression and bicycle sharing system data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Clustering transformed compositional data using K-means,with applications in gene expression and bicycle sharing system data

Authors:	Antoine Godichon-Baggioni Cathy Maugis-Rabusseau Andrea Rau

Institution:	1. Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, UPS, F-31062 Toulouse, Cedex 9, France;2. Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, INSA, F-31077 Toulouse, France;3. GABI, INRA, AgroParisTech, Université Paris-Saclay, Paris, France

Abstract:	Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e. data whose rows belong to the simplex) remains largely unexplored in cases where the observed value is equal or close to zero for one or more samples. This work is motivated by the analysis of two applications, both focused on the categorization of compositional profiles: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K-means algorithm. We use a non-asymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor package coseq, on both the gene expression and bicycle sharing system data.

Keywords:	Clustering compositional data data transformations K-means

设为首页 | 免责声明 | 关于勤云 | 加入收藏