首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Random sampling from databases: a survey
Authors:Frank Olken  Doron Rotem
Institution:(1) Information and Computing Sciences Division, Lawrence Berkeley Laboratory, 94720 Berkeley, CA, USA;(2) Management Information Systems Department, School of Business, San Jose State University, San Jose, CA, USA
Abstract:This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B + trees, hash files, spatial data structures (including R-trees and quadtrees). Algorithms for sampling from simple relational queries, e.g. single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g. the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision-theoretic approaches to sampling for query optimization are reviewed.
Keywords:clustered sampling  estimation  materialized views  query optimization  query result size estimation  query processing  random sampling  relational databases  reservoir sampling  sampling algorithms  simple random sampling  sequential sampling  size of projections  size of transitive closure
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号