Prediction-based regularization using data augmented regression |
| |
Authors: | Giles Hooker Saharon Rosset |
| |
Institution: | 1. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA 2. School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel
|
| |
Abstract: | The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be
in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved
by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior
knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a
more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions
of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of
dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization
can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate
the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression
(DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior
of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while
maintaining, and often improving, predictive accuracy. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|