首页 | 本学科首页   官方微博 | 高级检索  
     


HDDA: DataSifter: statistical obfuscation of electronic health records and other sensitive datasets
Authors:Simeone Marino  Nina Zhou  Yi Zhao  Lu Wang  Qiucheng Wu
Affiliation:1. Statistics Online Computational Resource, University of Michigan, Ann Arbor, MI, USA;2. Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA;3. Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
Abstract:There are no practical and effective mechanisms to share high-dimensional data including sensitive information in various fields like health financial intelligence or socioeconomics without compromising either the utility of the data or exposing private personal or secure organizational information. Excessive scrambling or encoding of the information makes it less useful for modelling or analytical processing. Insufficient preprocessing may compromise sensitive information and introduce a substantial risk for re-identification of individuals by various stratification techniques. To address this problem, we developed a novel statistical obfuscation method (DataSifter) for on-the-fly de-identification of structured and unstructured sensitive high-dimensional data such as clinical data from electronic health records (EHR). DataSifter provides complete administrative control over the balance between risk of data re-identification and preservation of the data information. Simulation results suggest that DataSifter can provide privacy protection while maintaining data utility for different types of outcomes of interest. The application of DataSifter on a large autism dataset provides a realistic demonstration of its promise practical applications.
Keywords:Data sharing  personal privacy  information protection  Big Data  statistical method
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号