A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Jan Deriu; Katsiaryna Mlynchyk; Philippe Schläpfer; Alvaro Rodrigo; Dirk von Grünigen; Nicolas Kaiser; Kurt Stockinger; Eneko Agirre; Mark Cieliebak

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

Abstract Paper Share

Question Answering Long Paper

Session 1B: Jul 6 (06:00-07:00 GMT)

Session 2A: Jul 6 (08:00-09:00 GMT)

Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

Similar Papers

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Kyle Swanson, Lili Yu, Tao Lei,

Query Graph Generation for Answering Multi-hop Complex Questions from Knowledge Bases

Yunshi Lan, Jing Jiang,

Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing

Alane Suhr, Ming-Wei Chang, Peter Shaw, Kenton Lee,

TaPas: Weakly Supervised Table Parsing via Pre-training

Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Eisenschlos,