Distribution of the length of the longest common subsequence of two multi-state biological sequences |
| |
Authors: | James C. Fu W.Y. Wendy Lou |
| |
Affiliation: | 1. Department of Statistics, University of Manitoba, Winnipeg, MB, Canada R3T 2N2;2. Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada M5T 3M7 |
| |
Abstract: | The length of the longest common subsequence (LCS) among two biological sequences has been used as a measure of similarity, and the application of this statistic is of importance in genomic studies. Even for the simple case of two sequences of equal length and composed of binary elements with equal state probabilities, the exact distribution of the length of the LCS remains an open question. This problem is also known as an NP-hard problem in computer science. Apart from combinatorial analysis, using the finite Markov chain imbedding technique, we derive the exact distribution for the length of the LCS between two multi-state sequences of different lengths. Numerical results are provided to illustrate the theoretical results. |
| |
Keywords: | primary, 60E05 secondary, 60J10 |
本文献已被 ScienceDirect 等数据库收录! |
|