Random sampling from databases: a survey期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Random sampling from databases: a survey

Authors:	Frank Olken Doron Rotem

Institution:	(1) Information and Computing Sciences Division, Lawrence Berkeley Laboratory, 94720 Berkeley, CA, USA;(2) Management Information Systems Department, School of Business, San Jose State University, San Jose, CA, USA

Abstract:	This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B ⁺ trees, hash files, spatial data structures (including R-trees and quadtrees). Algorithms for sampling from simple relational queries, e.g. single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g. the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision-theoretic approaches to sampling for query optimization are reviewed.

Keywords:	clustered sampling estimation materialized views query optimization query result size estimation query processing random sampling relational databases reservoir sampling sampling algorithms simple random sampling sequential sampling size of projections size of transitive closure
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏