Daumé Co-Organizes Natural Language Processing Contest

Mon May 15, 2017

Experts in computational linguistics from the University of Maryland and the University of Washington have launched an international competition that challenges participants to either build a system to solve one or two natural language processing (NLP) tasks, or find ways to poke holes in other people’s systems.

Hal Daumé (in photo), an associate professor of computer science with appointments in the University of Maryland Institute for Advanced Computer Studies and the Language Science Center, co-organized the contest with Emily M. Bender, a professor of linguistics at the University of Washington with an appointment in the Department of Computer Science and Engineering.

Build It, Break It: Language Edition pits researchers from anywhere in the world in a six-week contest focused on NLP technology. It is modeled after the successful Build It, Break It, Fix It contest—a unique cybersecurity competition launched in 2014 by faculty and graduate students in the Maryland Cybersecurity Center.

The language competition matches NLP system “builders” against human “breakers” in an attempt to learn more about the generalizability of current technology, says Sudha Rao, a fourth-year computer science doctoral student in the Computational Linguistics and Information Processing (CLIP) Lab.

Builders are first charged with building a system for solving a unique NLP task. In the second round, breakers create test cases that they think will fool the builders’ systems. During the final judging round, breaker test cases will be sent to builder teams and both builders and breakers will be evaluated.

“Humans have remarkable language ability—in our native language or languages, we're able to learn complex linguistic generalizations from remarkably little data,” says Daumé, who is director of the CLIP Lab.

But computers that attempt the same process often wind up much more limited in their linguistic abilities, and generalize poorly to data that’s even slightly different from the data they’ve been built on, he adds.

Contest organizers believe the competition will provide participants with new insights and lead to better NLP technology.

"We hope that both builder and breaker participants will gain a new appreciation for all the ways things can go wrong with their systems, but also become better informed about the linguistic theory that can help fix these problems," Daumé says.

Several doctoral and undergraduate students from UMD—from both computer science and linguistics—are helping Daumé and Bender run the contest.

Scores and final results from the participating teams will be released on June 5.