Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction
Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš
Machine Translation Short Paper
Session 12A: Jul 8
(08:00-09:00 GMT)
Session 14B: Jul 8
(18:00-19:00 GMT)
Abstract:
Effective projection-based cross-lingual word embedding (CLWE) induction critically relies on the iterative self-learning procedure. It gradually expands the initial small seed dictionary to learn improved cross-lingual mappings. In this work, we present ClassyMap, a classification-based approach to self-learning, yielding a more robust and a more effective induction of projection-based CLWEs. Unlike prior self-learning methods, our approach allows for integration of diverse features into the iterative process. We show the benefits of ClassyMap for bilingual lexicon induction: we report consistent improvements in a weakly supervised setup (500 seed translation pairs) on a benchmark with 28 language pairs.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces
Goran Glavaš, Ivan Vulić,

Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber,

A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction
Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma,
