A latent dirichlet allocation model for entity resolution

TitleA latent dirichlet allocation model for entity resolution
Publication TypeJournal Articles
Year of Publication2006
AuthorsBhattacharya I, Getoor L
Journal6th SIAM Conference on Data Mining (SDM), Bethesda, USA
Date Published2006///
Abstract

In this paper, we address the problem of entity resolution, where given many references to un-derlying objects, the task is to predict which references correspond to the same object. We propose
a probabilistic model for collective entity resolution. Our approach differs from other recently pro-
posed entity resolution approaches in that it is a) unsupervised, b) generative and c) introduces a hidden
‘group’ variable to capture collections of entities which are commonly observed together. The entity
resolution decisions are not considered on an independent pairwise basis, but instead decisions are made
collectively. We focus on how the use of relational links among the references can be exploited. We
show how we can use Gibbs Sampling to infer the collaboration groups and the entities jointly from
the observed co-author relationships among entity references and how this improves entity resolution
performance. We demonstrate the utility of our approach on two real-world bibliographic datasets. In
addition, we present preliminary results on characterizing conditions under which collaborative infor-
mation is useful.