TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

Christoph Alt; Aleksandra Gabryszak; Leonhard Hennig

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

Christoph Alt, Aleksandra Gabryszak, Leonhard Hennig

Abstract Paper Share

Information Extraction Long Paper

Session 2B: Jul 6 (09:00-10:00 GMT)

Session 3B: Jul 6 (13:00-14:00 GMT)

Abstract: TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE). But, even with recent advances in unsupervised pre-training and knowledge enhanced neural RE, models still show a high error rate. In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? And how do crowd annotations, dataset, and models contribute to this error rate? To answer these questions, we first validate the most challenging 5K examples in the development and test sets using trained annotators. We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled. On the relabeled test set the average F1 score of a large baseline model set improves from 62.1 to 70.1. After validation, we analyze misclassifications on the challenging instances, categorize them into linguistically motivated error groups, and verify the resulting error hypotheses on three state-of-the-art RE models. We show that two groups of ambiguous relations are responsible for most of the remaining errors and that models may adopt shallow heuristics on the dataset when entities are not masked.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

Christoph Alt, Aleksandra Gabryszak, Leonhard Hennig

Similar Papers

Relabel the Noise: Joint Extraction of Entities and Relations via Cooperative Multiagents

Daoyuan Chen, Yaliang Li, Kai Lei, Ying Shen,

Dialogue-Based Relation Extraction

Dian Yu, Kai Sun, Claire Cardie, Dong Yu,

Probing Linguistic Features of Sentence-Level Representations in Relation Extraction

Christoph Alt, Aleksandra Gabryszak, Leonhard Hennig,

Inflecting When There's No Majority: Limitations of Encoder-Decoder Neural Networks as Cognitive Models for German Plurals

Kate McCurdy, Sharon Goldwater, Adam Lopez,