Parallel Sentence Mining by Constrained Decoding
Pinzhen Chen, Nikolay Bogoychev, Kenneth Heafield, Faheem Kirefu
Machine Translation Short Paper
Session 2B: Jul 6
(09:00-10:00 GMT)
Session 3B: Jul 6
(13:00-14:00 GMT)
Abstract:
We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
Multi-Task Neural Model for Agglutinative Language Translation
Yirong Pan, Xiao Li, Yating Yang, Rui Dong,

Enhancing Machine Translation with Dependency-Aware Self-Attention
Emanuele Bugliarello, Naoaki Okazaki,

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar,
