首页 | 本学科首页   官方微博 | 高级检索  
     


Estimating prediction error for complex samples
Authors:Andrew Holbrook  Thomas Lumley  Daniel Gillen
Affiliation:1. Department of Biostatistics, University of California, Los Angeles, CA, U.S.A.;2. Department of Statistics, University of Auckland, Auckland, New Zealand;3. Department of statistics, University of California, Irvine, CA, U.S.A.
Abstract:With a growing interest in using non-representative samples to train prediction models for numerous outcomes it is necessary to account for the sampling design that gives rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. After learning a prediction rule based on a non-uniform sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron (1986) proposed a general class of covariance penalty inflated prediction error estimators that assume the available training data are representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz–Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation. The resulting Horvitz–Thompson–Efron estimator is equivalent to dAIC, a recent extension of Akaike's information criteria to survey sampling data, but is more widely applicable. The proposed methodology is assessed with simulations and is applied to models predicting renal function obtained from the large-scale National Health and Nutrition Examination Study survey. The Canadian Journal of Statistics 48: 204–221; 2020 © 2019 Statistical Society of Canada
Keywords:AIC  generalization error  generalized linear models  Horvitz–Thompson  NHANES III
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号