首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Sequential Pattern Analysis: A Statistical Investigation of Sequence Length and Support
Authors:Christian H Weiß  Miia Peltola
Institution:1. Department of Mathematics , Darmstadt University of Technology , Darmstadt , Germany;2. Faculty of Arts I , Institute of German Philology, University of Würzburg , Würzburg , Germany
Abstract:In sequential pattern analysis, the frequency of patterns is evaluated by the support. While computed efficiently from large databases, we show that the support cannot be compared between different databases, since it is influenced by the actual sequence length distribution. Models for this sequence length distribution are surveyed. One of these models, the Good distribution, appears to be sufficiently flexible for practice. It is used to exemplify an approach for adjusting the relative support such that the resulting adjusted support values are better comparable between different databases. We illustrate our findings with texts from the bilingual FinDe corpus.
Keywords:Good distribution  Sequence length distribution  Sequential pattern analysis  Support  Text corpus
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号