Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Pia Sommerauer

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Pia Sommerauer

Abstract Paper Share

Student Research Workshop SRW Paper

Session 3A: Jul 6 (12:00-13:00 GMT)

Session 14B: Jul 8 (18:00-19:00 GMT)

Abstract: What do powerful models of word mean- ing created from distributional data (e.g. Word2vec (Mikolov et al., 2013) BERT (Devlin et al., 2019) and ELMO (Peters et al., 2018)) represent? What causes words to be similar in the semantic space? What type of information is lacking? This thesis proposal presents a framework for investigating the information encoded in distributional semantic models. Several analysis methods have been suggested, but they have been shown to be limited and are not well understood. This approach pairs observations made on actual corpora with insights obtained from data manipulation experiments. The expected outcome is a better understanding of (1) the semantic information we can infer purely based on linguistic co-occurrence patterns and (2) the potential of distributional semantic models to pick up linguistic evidence.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Pia Sommerauer

Similar Papers

Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words

Josef Klafka, Allyson Ettinger,

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu,

Semi-supervised Contextual Historical Text Normalization

Peter Makarov, Simon Clematide,

Speakers enhance contextually confusable words

Eric Meinhardt, Eric Bakovic, Leon Bergen,