Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong

Abstract Paper Share

Machine Translation Long Paper

Session 14B: Jul 8 (18:00-19:00 GMT)
Session 15A: Jul 8 (20:00-21:00 GMT)
Abstract: This paper proposes a simple and effective approach to address the problem of posterior collapse in conditional variational autoencoders (CVAEs). It thus improves performance of machine translation models that use noisy or monolingual data, as well as in conventional settings. Extending Transformer and conditional VAEs, our proposed latent variable model measurably prevents posterior collapse by (1) using a modified evidence lower bound (ELBO) objective which promotes mutual information between the latent variable and the target, and (2) guiding the latent variable with an auxiliary bag-of-words prediction task. As a result, the proposed model yields improved translation quality compared to existing variational NMT models on WMT Ro↔En and De↔En. With latent variables being effectively utilized, our model demonstrates improved robustness over non-latent Transformer in handling uncertainty: exploiting noisy source-side monolingual data (up to +3.2 BLEU), and training with weakly aligned web-mined parallel data (up to +4.7 BLEU).
You can open the pre-recorded video in a separate window.
NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Similar Papers

Variational Neural Machine Translation with Normalizing Flows
Hendra Setiawan, Matthias Sperber, Udhyakumar Nallasamy, Matthias Paulik,
A representative figure from paper main.694
A Batch Normalized Inference Network Keeps the KL Vanishing Away
Qile Zhu, Wei Bi, Xiaojiang Liu, Xiyao Ma, Xiaolin Li, Dapeng Wu,
A representative figure from paper main.235
Discrete Latent Variable Representations for Low-Resource Text Classification
Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu,
A representative figure from paper main.437