Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks. 相似文献
Rank aggregation aims at combining rankings of a set of items assigned by a sample of rankers to generate a consensus ranking. A typical solution is to adopt a distance-based approach to minimize the sum of the distances to the observed rankings. However, this simple sum may not be appropriate when the quality of rankers varies. This happens when rankers with different backgrounds may have different cognitive levels of examining the items. In this paper, we develop a new distance-based model by allowing different weights for different rankers. Under this model, the weight associated with a ranker is used to measure his/her cognitive level of ranking of the items, and these weights are unobserved and exponentially distributed. Maximum likelihood method is used for model estimation. Extensions to the cases of incomplete rankings and mixture modeling are also discussed. Empirical applications demonstrate that the proposed model produces better rank aggregation than those generated by Borda and the unweighted distance-based models.
When using latent growth modeling (LGM), researchers often restrict the factor loadings, while the multilevel modeling (MLM) treats time as a metric variable. However, when individually varying times of observations are concerned in the longitudinal studies, the use of specified loadings would lead to inaccurate estimation. Based on piecewise growth modeling (PGM), this simulation study showed that (i) individually varying times of observations with larger boundaries got worse estimates and model fits when LGM was used; (ii) estimating the PGM across all the simulation situations was robust within MLM, whereas LGM got identically equal estimation with MLM only in the case of time boundaries of ±1 month or shorter; (iii) larger change of slope in piecewise modeling indicated better estimation. 相似文献