A cost-effective lexical acquisition process for large-scale thesaurus translation

TitleA cost-effective lexical acquisition process for large-scale thesaurus translation
Publication TypeJournal Articles
Year of Publication2009
AuthorsJimmy Lin, Murray GC, Dorr BJ, Hajič J, Pecina P
JournalLanguage resources and evaluation
Volume43
Issue1
Pagination27 - 40
Date Published2009///
Abstract

Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fully-automatic translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domain-specific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the cost-effectiveness of our approach.