首页 | 本学科首页   官方微博 | 高级检索  
     


A Modular CDF Approach for the Approximation of Percentiles
Authors:Kingshuk Roy Choudhury  Sabin Tabirca
Affiliation:1. Statistics Department , University College , Cork, Ireland kingshuk@stat.ucc.ie;3. Computer Science Department , University College , Cork, Ireland
Abstract:This article describes a method for computing approximate statistics for large data sets, when exact computations may not be feasible. Such situations arise in applications such as climatology, data mining, and information retrieval (search engines). The key to our approach is a modular approximation to the cumulative distribution function (cdf) of the data. Approximate percentiles (as well as many other statistics) can be computed from this approximate cdf. This enables the reduction of a potentially overwhelming computational exercise into smaller, manageable modules. We illustrate the properties of this algorithm using a simulated data set. We also examine the approximation characteristics of the approximate percentiles, using a von Mises functional type approach. In particular, it is shown that the maximum error between the approximate cdf and the actual cdf of the data is never more than 1% (or any other preset level). We also show that under assumptions of underlying smoothness of the cdf, the approximation error is much lower in an expected sense. Finally, we derive bounds for the approximation error of the percentiles themselves. Simulation experiments show that these bounds can be quite tight in certain circumstances.
Keywords:Approximate cdf  Approximation error  Modular approximation  von Mises functional
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号