Empirical Comparison of Nonparametric Regression Estimates on Real Data |
| |
Authors: | Daniel Jones Michael Kohler Alexander Richter |
| |
Affiliation: | 1. Fachbereich Mathematik, Technische Universit?t Darmstadt, Darmstadt, Germany;2. Hessisches Statistisches Landesamt, Wiesbaden, Germany |
| |
Abstract: | The performance of nine different nonparametric regression estimates is empirically compared on ten different real datasets. The number of data points in the real datasets varies between 7, 900 and 18, 000, where each real dataset contains between 5 and 20 variables. The nonparametric regression estimates include kernel, partitioning, nearest neighbor, additive spline, neural network, penalized smoothing splines, local linear kernel, regression trees, and random forests estimates. The main result is a table containing the empirical L2 risks of all nine nonparametric regression estimates on the evaluation part of the different datasets. The neural networks and random forests are the two estimates performing best. The datasets are publicly available, so that any new regression estimate can be easily compared with all nine estimates considered in this article by just applying it to the publicly available data and by computing its empirical L2 risks on the evaluation part of the datasets. |
| |
Keywords: | L2 error Nonparametric regression Real data performance |
|
|