A new recombination lower bound and the minimum perfect phylogenetic forest problem |
| |
Authors: | Yufeng Wu Dan Gusfield |
| |
Institution: | (1) Computer Science and Engineering Department, University of Connecticut, Storrs, CT 06269, USA;(2) Department of Computer Science, University of California, Davis, CA 95616, USA |
| |
Abstract: | Understanding recombination is a central problem in population genetics. In this paper, we address an established computational
problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences
(Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated
as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which
may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers
and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes
the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing
lower bounds for low quality data.
A preliminary version of this paper appeared in the Proceedings of COCOON 2007, LNCS, vol. 4598, pp. 16–26.
The work was performed while Y. Wu was with UC Davis and supported by grants CCF-0515278 and IIS-0513910 from National Science
Foundation.
D. Gusfield supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation. |
| |
Keywords: | Recombination Lower bound on the minimum number of recombination Ancestral recombination graph Population genetics Computational complexity |
本文献已被 SpringerLink 等数据库收录! |
|