首页 | 本学科首页   官方微博 | 高级检索  
     


Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the National Health Interview Survey
Authors:Joseph W. Sakshaug  Trivellore E. Raghunathan
Affiliation:1. Department of Statistical Methods, Institute for Employment Research, Nuremberg, Germany;2. Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
Abstract:Small area statistics obtained from sample survey data provide a critical source of information used to study health, economic, and sociological trends. However, most large-scale sample surveys are not designed for the purpose of producing small area statistics. Moreover, data disseminators are prevented from releasing public-use microdata for small geographic areas for disclosure reasons; thus, limiting the utility of the data they collect. This research evaluates a synthetic data method, intended for data disseminators, for releasing public-use microdata for small geographic areas based on complex sample survey data. The method replaces all observed survey values with synthetic (or imputed) values generated from a hierarchical Bayesian model that explicitly accounts for complex sample design features, including stratification, clustering, and sampling weights. The method is applied to restricted microdata from the National Health Interview Survey and synthetic data are generated for both sampled and non-sampled small areas. The analytic validity of the resulting small area inferences is assessed by direct comparison with the actual data, a simulation study, and a cross-validation study.
Keywords:complex sample survey data  hierarchical Bayesian model  small area inference  synthetic data  multiple imputation  EM algorithm
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号