首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Statistical correction for functional metagenomic profiling of a microbial community with short NGS reads
Authors:Ruofei Du
Institution:Biostatistics Shared Resource, University of New Mexico Comprehensive Cancer Center, Albuquerque, USA
Abstract:By sequence homology search, the list of all the functions found and the counts of reads being aligned to them present the functional profile of a metagenomic sample. However, a significant obstacle has been observed in this approach due to the short read length associated with many next-generation sequencing technologies. This includes artificial families, cross-annotations, length bias and conservation bias. The widely applied cut-off methods, such as BLAST E-value, are not able to solve the problems. Following the published successful procedures on the artificial families and the cross-annotation issue, we propose in this paper to use zero-truncated Poisson and Binomial (ZTP-Bin) hierarchical modelling to correct the length bias and the conservation bias. Goodness of fit of the modelling and cross-validation for the prediction using a bioinformatic simulated sample show the validity of this approach. Evaluated on an in vitro-simulated data set, the proposed modelling method outperforms other traditional methods. All three steps were then sequentially applied on real-life metagenomic samples to show that the proposed framework will lead to a more accurate functional profile of a short-read metagenomic sample.
Keywords:Metagenomics  functional profiling  short reads  length bias  conservation bias
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号