CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task

Shuo Sun; Suzanna Sia; Kevin Duh

CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task

Shuo Sun, Suzanna Sia, Kevin Duh

Abstract Paper Demo Share

System Demonstrations Demo Paper

Demo Session 1B-2: Jul 7 (05:45-06:45 GMT)

Demo Session 3B-2: Jul 7 (12:45-13:45 GMT)

Abstract: We present CLIReval, an easy-to-use toolkit for evaluating machine translation (MT) with the proxy task of cross-lingual information retrieval (CLIR). Contrary to what the project name might suggest, CLIReval does not actually require any annotated CLIR dataset. Instead, it automatically transforms translations and references used in MT evaluations into a synthetic CLIR dataset; it then sets up a standard search engine (Elasticsearch) and computes various information retrieval metrics (e.g., mean average precision) by treating the translations as documents to be retrieved. The idea is to gauge the quality of MT by its impact on the document translation approach to CLIR. As a case study, we run CLIReval on the "metrics shared task" of WMT2019; while this extrinsic metric is not intended to replace popular intrinsic metrics such as BLEU, results suggest CLIReval is competitive in many language pairs in terms of correlation to human judgments of quality. CLIReval is publicly available at https://github.com/ssun32/CLIReval.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task

Shuo Sun, Suzanna Sia, Kevin Duh

Similar Papers

SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization

Yang Gao, Wei Zhao, Steffen Eger,

It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information

Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos, Ryan Cotterell, Naoaki Okazaki,

Syntactic Search by Example

Micah Shlain, Hillel Taub-Tabib, Shoval Sadde, Yoav Goldberg,

Designing Precise and Robust Dialogue Response Evaluators

Tianyu Zhao, Divesh Lala, Tatsuya Kawahara,