Investigating Word-Class Distributions in Word Vector Spaces

Ryohei Sasano; Anna Korhonen

Investigating Word-Class Distributions in Word Vector Spaces

Ryohei Sasano, Anna Korhonen

Abstract Paper Share

Semantics: Lexical Long Paper

Session 6B: Jul 7 (06:00-07:00 GMT)

Session 8A: Jul 7 (12:00-13:00 GMT)

Abstract: This paper presents an investigation on the distribution of word vectors belonging to a certain word class in a pre-trained word vector space. To this end, we made several assumptions about the distribution, modeled the distribution accordingly, and validated each assumption by comparing the goodness of each model. Specifically, we considered two types of word classes – the semantic class of direct objects of a verb and the semantic class in a thesaurus – and tried to build models that properly estimate how likely it is that a word in the vector space is a member of a given word class. Our results on selectional preference and WordNet datasets show that the centroid-based model will fail to achieve good enough performance, the geometry of the distribution and the existence of subgroups will have limited impact, and also the negative instances need to be considered for adequate modeling of the distribution. We further investigated the relationship between the scores calculated by each model and the degree of membership and found that discriminative learning-based models are best in finding the boundaries of a class, while models based on the offset between positive and negative instances perform best in determining the degree of membership.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Investigating Word-Class Distributions in Word Vector Spaces

Ryohei Sasano, Anna Korhonen

Similar Papers

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg,

Empower Entity Set Expansion via Language Model Probing

Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han,

Generative Semantic Hashing Enhanced via Boltzmann Machines

Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen,

Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition

Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Ryuto Konno, Kentaro Inui,