More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Katherine Stasaski; Grace Hui Yang; Marti A. Hearst

More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Katherine Stasaski, Grace Hui Yang, Marti A. Hearst

Abstract Paper Share

Resources and Evaluation Long Paper

Session 9A: Jul 7 (17:00-18:00 GMT)

Session 10B: Jul 7 (21:00-22:00 GMT)

Abstract: Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Katherine Stasaski, Grace Hui Yang, Marti A. Hearst

Similar Papers

CDL: Curriculum Dual Learning for Emotion-Controllable Response Generation

Lei Shen, Yang Feng,

Multi-Domain Dialogue Acts and Response Co-Generation

Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu,

Diversifying Dialogue Generation with Non-Conversational Text

Hui Su, Xiaoyu Shen, Sanqiang Zhao, Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou,

Speaker Sensitive Response Evaluation Model

JinYeong Bak, Alice Oh,