Statistical data integration using multilevel models to predict employee compensation |
| |
Authors: | Andreea L. Erciulescu Jean D. Opsomer Benjamin J. Schneider |
| |
Affiliation: | Westat, Rockville, MD, U.S.A |
| |
Abstract: | This article considers the case where two surveys collect data on a common variable, with one survey being much smaller than the other. The smaller survey collects data on an additional variable of interest, related to the common variable collected in the two surveys, and out-of-scope with respect to the larger survey. Estimation of the two related variables is of interest at domains defined at a granular level. We propose a multilevel model for integrating data from the two surveys, by reconciling survey estimates available for the common variable, accounting for the relationship between the two variables, and expanding estimation for the other variable, for all the domains of interest. The model is specified as a hierarchical Bayes model for domain-level survey data, and posterior distributions are constructed for the two variables of interest. A synthetic estimation approach is considered as an alternative to the hierarchical modelling approach. The methodology is applied to wage and benefits estimation using data from the National Compensation Survey and the Occupational Employment Statistics Survey, available from the Bureau of Labor Statistics, Department of Labor, United States. |
| |
Keywords: | Data integration granularity hierarchical Bayes meta studies multiple data sources small-area estimation sparse survey data |
|
|