Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg
Machine Learning for NLP Long Paper
Session 12B: Jul 8
(09:00-10:00 GMT)
Session 13A: Jul 8
(12:00-13:00 GMT)
Abstract:
The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for multiple uses, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation
Chaojun Wang, Rico Sennrich,

Masking Actor Information Leads to Fairer Political Claims Detection
Erenay Dayanik, Sebastian Padó,

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, Ahmed Hassan Awadallah,

