首页 | 本学科首页   官方微博 | 高级检索  
     


An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence
Authors:Hiroki Arimura  Takeaki Uno
Affiliation:(1) Hokkaido University, Kita 14-jo, Nishi 9-chome, Sapporo 060-0814, Japan;(2) National Institute of Informatics, Tokyo, 101-8430, Japan
Abstract:In this paper, we consider the problem of enumerating all maximal motifs in an input string for the class of repeated motifs with wild cards. A maximal motif is such a representative motif that is not properly contained in any larger motifs with the same location lists. Although the enumeration problem for maximal motifs with wild cards has been studied in Parida et al. (2001), Pisanti et al. (2003) and Pelfrêne et al. (2003), its output-polynomial time computability has been still open. The main result of this paper is a polynomial space polynomial delay algorithm for the maximal motif enumeration problem for the repeated motifs with wild cards. This algorithm enumerates all maximal motifs in an input string of length n in O(n 3) time per motif with O(n) space, in particular O(n 3) delay. The key of the algorithm is depth-first search on a tree-shaped search route over all maximal motifs based on a technique called prefix-preserving closure extension. We also show an exponential lower bound and a succinctness result on the number of maximal motifs, which indicate the limit of a straightforward approach. The results of the computational experiments show that our algorithm can be applicable to huge string data such as genome data in practice, and does not take large additional computational cost compared to usual frequent motif mining algorithms. This work is done during the Hiroki Arimura’s visit in LIRIS, University Claude-Bernard Lyon 1, France.
Keywords:Motif  Maximal motif  Data mining  Sequence mining  Algorithm  Delay  Enumeration  Polynomial time  Closed itemset  Closed pattern  Pattern discovery
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号