An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence |
| |
Authors: | Hiroki Arimura Takeaki Uno |
| |
Affiliation: | (1) Hokkaido University, Kita 14-jo, Nishi 9-chome, Sapporo 060-0814, Japan;(2) National Institute of Informatics, Tokyo, 101-8430, Japan |
| |
Abstract: | In this paper, we consider the problem of enumerating all maximal motifs in an input string for the class of repeated motifs with wild cards. A maximal motif is such a representative motif that is not properly contained in any larger motifs with the same location lists. Although the enumeration problem for maximal motifs with wild cards has been studied in Parida et al. (2001), Pisanti et al. (2003) and Pelfrêne et al. (2003), its output-polynomial time computability has been still open. The main result of this paper is a polynomial space polynomial delay algorithm for the maximal motif enumeration problem for the repeated motifs with wild cards. This algorithm enumerates all maximal motifs in an input string of length n in O(n 3) time per motif with O(n) space, in particular O(n 3) delay. The key of the algorithm is depth-first search on a tree-shaped search route over all maximal motifs based on a technique called prefix-preserving closure extension. We also show an exponential lower bound and a succinctness result on the number of maximal motifs, which indicate the limit of a straightforward approach. The results of the computational experiments show that our algorithm can be applicable to huge string data such as genome data in practice, and does not take large additional computational cost compared to usual frequent motif mining algorithms. This work is done during the Hiroki Arimura’s visit in LIRIS, University Claude-Bernard Lyon 1, France. |
| |
Keywords: | Motif Maximal motif Data mining Sequence mining Algorithm Delay Enumeration Polynomial time Closed itemset Closed pattern Pattern discovery |
本文献已被 SpringerLink 等数据库收录! |
|