Microdata disclosure by resampling – Empirical findings for business survey data |
| |
Authors: | Sandra Gottschalk |
| |
Institution: | (1) ZEW, Industrieökonomik und Internationale Unternehmensführung, 68161 Mannheim |
| |
Abstract: | Summary: One specific problem statistical offices and research institutes are faced with
when releasing microdata is the preservation of confidentiality. Traditional methods to
avoid disclosure often destroy the structure of the data, and information loss is potentially
high. In this paper an alternative technique of creating scientific–use files is discussed,
which reproduces the characteristics of the original data quite well. It is based on Fienberg
(1997, 1994) who estimates and resamples from the empirical multivariate cumulative
distribution function of the data in order to get synthetic data. The procedure creates
data sets – the resample – which have the same characteristics as the original survey
data. The paper includes some applications of this method with (a) simulated data and
(b) innovation survey data, the Mannheim Innovation Panel (MIP), and a comparison
between resampling and a common method of disclosure control (disturbance with multiplicative error) with regard to confidentiality on the one hand and the appropriateness
of the disturbed data for different kinds of analyses on the other. The results show that
univariate distributions can be better reproduced by unweighted resampling. Parameter
estimates can be reproduced quite well if the resampling procedure implements the correlation structure of the original data as a scale or if the data is multiplicatively perturbed
and a correction term is used. On average, anonymization of data with multiplicatively
perturbed values protects better against re–identification than the various resampling
methods used. |
| |
Keywords: | Resampling multiplicative data perturbation Monte Carlo studies business survey data |
本文献已被 SpringerLink 等数据库收录! |
|