More Diverse Dialogue Datasets via Diversity-Informed Data Collection

Katherine Stasaski, Grace Hui Yang, Marti A. Hearst

Abstract Paper Share

Resources and Evaluation Long Paper

Session 9A: Jul 7 (17:00-18:00 GMT)
Session 10B: Jul 7 (21:00-22:00 GMT)
Abstract: Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.
You can open the pre-recorded video in a separate window.
NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Similar Papers

Multi-Domain Dialogue Acts and Response Co-Generation
Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu,
A representative figure from paper main.638
Diversifying Dialogue Generation with Non-Conversational Text
Hui Su, Xiaoyu Shen, Sanqiang Zhao, Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou,
A representative figure from paper main.634