More Diverse Dialogue Datasets via Diversity-Informed Data Collection
Katherine Stasaski, Grace Hui Yang, Marti A. Hearst
Resources and Evaluation Long Paper
Session 9A: Jul 7
(17:00-18:00 GMT)
Session 10B: Jul 7
(21:00-22:00 GMT)
Abstract:
Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
Multi-Domain Dialogue Acts and Response Co-Generation
Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu,

Diversifying Dialogue Generation with Non-Conversational Text
Hui Su, Xiaoyu Shen, Sanqiang Zhao, Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou,


