首页 | 本学科首页   官方微博 | 高级检索  
     


A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage
Authors:Josef Schürle
Affiliation:(1) Department of Statistics, Econometrics and Operations Research, University of Tübingen, Mohlstrasse 36, D-72074 Tübingen, Germany
Abstract:
An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.
Keywords:Exact Matching  Probabilistic Matching  Mixture Model  Incomplete Data  Maximum-Likelihood Estimation  EM Algorithm  Simulation Study
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号