The Interestingness Paradox in Pattern Discovery |
| |
Authors: | Balaji Padmanabhan |
| |
Institution: |
a The Wharton School, University of Pennsylvania, USA. |
| |
Abstract: | Noting that several rule discovery algorithms in data mining can produce a large number of irrelevant or obvious rules from data, there has been substantial research in data mining that addressed the issue of what makes rules truly 'interesting'. This resulted in the development of a number of interestingness measures and algorithms that find all interesting rules from data. However, these approaches have the drawback that many of the discovered rules, while supposed to be interesting by definition, may actually (1) be obvious in that they logically follow from other discovered rules or (2) be expected given some of the other discovered rules and some simple distributional assumptions. In this paper we argue that this is a paradox since rules that are supposed to be interesting, in reality are uninteresting for the above reason. We show that this paradox exists for various popular interestingness measures and present an abstract characterization of an approach to alleviate the paradox. We finally discuss existing work in data mining that addresses this issue and show how these approaches can be viewed with respect to the characterization presented here. |
| |
Keywords: | Interestingness measures rule discovery minimality |
本文献已被 InformaWorld 等数据库收录! |
|