A cost-effective lexical acquisition process for large-scale thesaurus translation
Title | A cost-effective lexical acquisition process for large-scale thesaurus translation |
Publication Type | Journal Articles |
Year of Publication | 2009 |
Authors | Jimmy Lin, Murray GC, Dorr BJ, Hajič J, Pecina P |
Journal | Language resources and evaluation |
Volume | 43 |
Issue | 1 |
Pagination | 27 - 40 |
Date Published | 2009/// |
Abstract | Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fully-automatic translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domain-specific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the cost-effectiveness of our approach. |