/GSDMM

A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering

Primary LanguageJupyter Notebook

GSDMM

A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering

Python 3.7

The datasets are in format of JSON like follows:
{"text": "centrepoint winter white gala london", "cluster": 65}
{"text": "mourinho seek killer instinct", "cluster": 96}
{"text": "roundup golden globe won seduced johansson voice", "cluster": 72}
{"text": "travel disruption mount storm cold air sweep south florida", "cluster": 140}
{"text": "wes welker blame costly turnover", "cluster": 89}
......

Citation

Please cite the following paper for the data usage:

@article{chen2019nonparametric, title={A nonparametric model for online topic discovery with word embeddings}, author={Chen, Junyang and Gong, Zhiguo and Liu, Weiwen}, journal={Information Sciences}, volume={504}, pages={32--47}, year={2019}, publisher={Elsevier} }