An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering
Jay Kumar, Junming Shao, Salah Uddin, Wazir Ali
Information Retrieval and Text Mining Long Paper
Session 1B: Jul 6
(06:00-07:00 GMT)
Session 3A: Jul 6
(12:00-13:00 GMT)
Abstract:
Clustering short text streams is a challenging task due to its unique properties: infinite length, sparse data representation and cluster evolution. Existing approaches often exploit short text streams in a batch way. However, determine the optimal batch size is usually a difficult task since we have no priori knowledge when the topics evolve. In addition, traditional independent word representation in graphical model tends to cause ``term ambiguity" problem in short text clustering. Therefore, in this paper, we propose an Online Semantic-enhanced Dirichlet Model for short sext stream clustering, called OSDM, which integrates the word-occurance semantic information (i.e., context) into a new graphical model and clusters each arriving short text automatically in an online way. Extensive results have demonstrated that OSDM has better performance compared to many state-of-the-art algorithms on both synthetic and real-world data sets.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
Autoencoding Keyword Correlation Graph for Document Clustering
Billy Chiu, Sunil Kumar Sahu, Derek Thomas, Neha Sengupta, Mohammady Mahdy,

Neural Mixed Counting Models for Dispersed Topic Discovery
Jiemin Wu, Yanghui Rao, Zusheng Zhang, Haoran Xie, Qing Li, Fu Lee Wang, Ziye Chen,

Enhancing Cross-target Stance Detection with Transferable Semantic-Emotion Knowledge
Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xiaofei Xu, Kuai Dai,

Improving Adversarial Text Generation by Modeling the Distant Future
Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin,
