Sequential Pattern Analysis: A Statistical Investigation of Sequence Length and Support |
| |
Authors: | Christian H Weiß Miia Peltola |
| |
Institution: | 1. Department of Mathematics , Darmstadt University of Technology , Darmstadt , Germany;2. Faculty of Arts I , Institute of German Philology, University of Würzburg , Würzburg , Germany |
| |
Abstract: | In sequential pattern analysis, the frequency of patterns is evaluated by the support. While computed efficiently from large databases, we show that the support cannot be compared between different databases, since it is influenced by the actual sequence length distribution. Models for this sequence length distribution are surveyed. One of these models, the Good distribution, appears to be sufficiently flexible for practice. It is used to exemplify an approach for adjusting the relative support such that the resulting adjusted support values are better comparable between different databases. We illustrate our findings with texts from the bilingual FinDe corpus. |
| |
Keywords: | Good distribution Sequence length distribution Sequential pattern analysis Support Text corpus |
|
|