首页 | 本学科首页   官方微博 | 高级检索  
     


Exemplar or matching: modeling DCJ problems with unequal content genome data
Authors:Zhaoming Yin  author-information"  >,Jijun Tang,Stephen W. Schaeffer,David A. Bader
Affiliation:1.School of Computer Science and Technology,Tianjin University,Tianjin,China;2.School of Computational Science and Engineering,Georgia Institute of Technology,Atlanta,USA;3.Department of Computer Science and Engineering,University of South Carolina,Columbia,USA;4.Department of Biology,The Pennsylvania State University,State College,USA;5.Oracle Corporation,Redwood City,USA
Abstract:The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels (i.e., insertions and deletions) are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin–Kernighan heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to exhibit pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号