Subsemble: an ensemble method for combining subset-specific algorithm fits期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Subsemble: an ensemble method for combining subset-specific algorithm fits

Authors:	Stephanie Sapp Mark J van der Laan John Canny

Institution:	1. Department of Statistics, University of California at Berkeley, Berkeley, CA, USA;2. Division of Biostatistics, University of California at Berkeley, Berkeley, CA, USA;3. Division of Computer Science, University of California at Berkeley, Berkeley, CA, USA

Abstract:	Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive data sets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large data sets. Subsemble partitions the full data set into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small- to moderate-sized data sets, and often has better prediction performance than the underlying algorithm fit just once on the full data set. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full data set.

Keywords:	ensemble methods prediction cross-validation machine learning big data

设为首页 | 免责声明 | 关于勤云 | 加入收藏