lei-liu1/ntm

Python

Examined models

Neural models

NVDM from https://arxiv.org/pdf/1511.06038.pdf -- ICML 2016
GSM from https://arxiv.org/pdf/1706.00359.pdf -- ICML 2017
NVLDA from https://arxiv.org/pdf/1703.01488.pdf -- ICLR 2017
ProdLDA from https://arxiv.org/pdf/1703.01488.pdf -- ICLR 2017
NSMTM from https://arxiv.org/pdf/1810.09079.pdf -- WSDM 2019
NSMDM from https://arxiv.org/pdf/1810.09079.pdf -- WSDM 2019
Scholar from https://arxiv.org/abs/1705.09296 -- ACL 2018
NVCTM from https://dl.acm.org/doi/10.1145/3308558.3313561 -- WWW 2019

Non-neural models

online-LDA: LDA using online variational inference
online-LDA: LDA using Gibbs sampling
NMF: online NMF

Metrics:

Perplexity of unseen documents: All models, except LDA_gibbs and NMF
Perplexity of unseen/held-out words: All models, except NMF
Topic coherence: All models
Performance in document classification: All models

Datasets:

Short texts:

[1] Title of news articles in W2E dataset from https://dl.acm.org/doi/abs/10.1145/3269206.3269309
[2] Web snippets from https://papers.nips.cc/paper/2002/hash/3147da8ab4a0437c15ef51a5cc7f2dc4-Abstract.html

Long texts:

[1] Content of news articles in W2E dataset from https://dl.acm.org/doi/abs/10.1145/3269206.3269309
[2] 20News

The datasets are preprocessed and shared here. Please download and unzip the files into the preprocessed_data folder

Script for training neural topic models:

Trainers:

train/trainer_neural_topic_model.py: for neural models
train/trainer_lda_topic_model.py: for LDA models
train/trainer_nmf_topic_model.py: for NMF models

Evaluators:

evaluation/eval_neural_models.py: for neural models
evaluation/eval_lda_models.py: for LDA models
evaluation/eval_nmf_models.py: for NMF models

Running experiments:

empirical_studies/examine_models.py