首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Computing AIC for black-box models using generalized degrees of freedom: A comparison with cross-validation
Authors:Severin Hauenstein  Simon N Wood  Carsten F Dormann
Institution:1. Department of Biometry and Environmental System Analysis, University of Freiburg, Freiburg, Germany;2. School of Mathematics, University of Bristol, Bristol, United Kingdom
Abstract:Generalized degrees of freedom (GDF), as defined by Ye (1998 Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association 93(441):120131. Google Scholar] JASA 93:120–131), represent the sensitivity of model fits to perturbations of the data. Such GDF can be computed for any statistical model, making it possible, in principle, to derive the effective number of parameters in machine-learning approaches and thus compute information-theoretical measures of fit. We compare GDF with cross-validation and find that the latter provides a less computer-intensive and more robust alternative. For Bernoulli-distributed data, GDF estimates were unstable and inconsistently sensitive to the number of data points perturbed simultaneously. Cross-validation, in contrast, performs well also for binary data, and for very different machine-learning approaches.
Keywords:Boosted regression trees  Data perturbation  Model complexity  Random forest
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号