A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal
Demian Gholipour Ghalandari, Chris Hokamp, Nghia The Pham, John Glover, Georgiana Ifrim
Summarization Short Paper
Session 2A: Jul 6
(08:00-09:00 GMT)
Session 3B: Jul 6
(13:00-14:00 GMT)
Abstract:
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters. We build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events, with links to external source articles. We also automatically extend these source articles by looking for related articles in the Common Crawl archive. We provide a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
NSTM: Real-Time Query-Driven News Overview Composition at Bloomberg
Joshua Bambrick, Minjie Xu, Andy Almonte, Igor Malioutov, Guim Perarnau, Vittorio Selo, Iat Chong Chan,

Discourse as a Function of Event: Profiling Discourse Structure in News Articles around the Main Event
Prafulla Kumar Choubey, Aaron Lee, Ruihong Huang, Lu Wang,

MIND: A Large-scale Dataset for News Recommendation
Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, Ming Zhou,

Leveraging Graph to Improve Abstractive Multi-Document Summarization
Wei Li, Xinyan Xiao, Jiachen Liu, Hua Wu, Haifeng Wang, Junping Du,
