Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction

Mladen Karan; Ivan Vulić; Anna Korhonen; Goran Glavaš

Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction

Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš

Abstract Paper Share

Machine Translation Short Paper

Session 12A: Jul 8 (08:00-09:00 GMT)

Session 14B: Jul 8 (18:00-19:00 GMT)

Abstract: Effective projection-based cross-lingual word embedding (CLWE) induction critically relies on the iterative self-learning procedure. It gradually expands the initial small seed dictionary to learn improved cross-lingual mappings. In this work, we present ClassyMap, a classification-based approach to self-learning, yielding a more robust and a more effective induction of projection-based CLWEs. Unlike prior self-learning methods, our approach allows for integration of diverse features into the iterative process. We show the benefits of ClassyMap for bilingual lexicon induction: we report consistent improvements in a weakly supervised setup (500 seed translation pairs) on a benchmark with 28 language pairs.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction

Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš

Similar Papers

Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces

Goran Glavaš, Ivan Vulić,

Should All Cross-Lingual Embeddings Speak English?

Antonios Anastasopoulos, Graham Neubig,

Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber,

A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction

Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma,