Biased Bootstrap Methods for Reducing the Effects of Contamination |
| |
Authors: | Peter Hall,& Brett Presnell |
| |
Affiliation: | Australian National University, Canberra, and Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia,;Australian National University, Canberra, Australia, and University of Florida, Gainesville, USA |
| |
Abstract: | Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback–Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε∈ (0, ½) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression. |
| |
Keywords: | Biased bootstrap Empirical likelihood Influence Inlier Local linear smoothing Multivariate analysis Nonparametric curve estimation Outlier Regression Robust statistical methods Trimming Weighted bootstrap |
|
|