Performance of localized regression tree splitting criteria on data with discontinuities |
| |
Authors: | Alexandra P. Bremner Ross H. Taplin |
| |
Affiliation: | Mathematics and Statistics, Murdoch University |
| |
Abstract: | Properties of the localized regression tree splitting criterion, described in Bremner & Taplin (2002) and referred to as the BT method, are explored in this paper and compared to those of Clark & Pregibon's (1992) criterion (the CP method). These properties indicate why the BT method can result in superior trees. This paper shows that the BT method exhibits a weak bias towards edge splits, and the CP method exhibits a strong bias towards central splits in the presence of main effects. A third criterion, called the SM method, that exhibits no bias towards a particular split position is introduced. The SM method is a modification of the BT method that uses more symmetric local means. The BT and SM methods are more likely to split at a discontinuity than the CP method because of their relatively low bias towards particular split positions. The paper shows that the BT and SM methods can be used to discover discontinuities in the data, and that they offer a way of producing a variety of different trees for examination or for tree averaging methods. |
| |
Keywords: | exploratory data analysis tree averaging |
|
|