/SeaNMF

Short Text Topic Modeling

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

SeaNMF

This the implementation of the paper

  • Tian Shi, Kyeongpil Kang, Jaegul Choo and Chandan K. Reddy, "Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations", In Proceedings of the International Conference on World Wide Web (WWW), Lyon, France, April 2018. PDF

Requirements

  • Python 3.5.2
  • argparse

usage:

Data Process

  • Tokenize with NLTK, SpaCy or CoreNLP
  • Remove special characters.
  • Remove stop-words.
  • Edit the argument of data_process.py
  • Run python3 data_process.py to prepare the document-term matrix and vocabulary.

Train

  • Run python3 train.py --help to see the full list of options.

Evaluation

  • Run python3 vis_topic.py to calculate the PMI and visualize the top keywords in each topic.