A Joint Model for Document Segmentation and Segment Labeling

Joe Barrow; Rajiv Jain; Vlad Morariu; Varun Manjunatha; Douglas Oard; Philip Resnik

A Joint Model for Document Segmentation and Segment Labeling

Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manjunatha, Douglas Oard, Philip Resnik

Abstract Paper Share

Information Retrieval and Text Mining Long Paper

Session 1A: Jul 6 (05:00-06:00 GMT)

Session 3A: Jul 6 (12:00-13:00 GMT)

Abstract: Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. Where previous work on text segmentation considers the tasks of document segmentation and segment labeling separately, we show that the tasks contain complementary information and are best addressed jointly. We introduce Segment Pooling LSTM (S-LSTM), which is capable of jointly segmenting a document and labeling segments. In support of joint training, we develop a method for teaching the model to recover from errors by aligning the predicted and ground truth segments. We show that S-LSTM reduces segmentation error by 30% on average, while also improving segment labeling.

You can open the pre-recorded video in a separate window.

NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

A Joint Model for Document Segmentation and Segment Labeling

Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manjunatha, Douglas Oard, Philip Resnik

Similar Papers

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension

Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu,

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

Qiu Ran, Yankai Lin, Peng Li, Jie Zhou,

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

Hang Yan, Xipeng Qiu, Xuanjing Huang,

BPE-Dropout: Simple and Effective Subword Regularization

Ivan Provilkov, Dmitrii Emelianenko, Elena Voita,