Simple PyTorch implementation of models
- NVDM in Neural Variational Inference for Text Processing. Yishu Miao, Lei Yu, Phil Blunsom. ICML 2016.
- GSM in Discovering Discrete Latent Topics with Neural Variational Inference. Yishu Miao, Edward Grefenstette, Phil Blunsom
- NTM and NTMR in Coherence-Aware Neural Topic Modeling Ran Ding, Ramesh Nallapati, Bing Xiang. EMNLP 2018
References to other related implementations:
- Pytorch AVITM
- autoencoding_vi_for_topic_models data/20news-clean was obtained there.
- nvdm
-
To evaluate topic coherence, the topic interpretationbility toolkit should be used which was already downloaded into ./scripts directory. To evaluate topic coherence, follow the steps below.
# Setup a Python 2 environment named py2 using conda package manager. conda create -n py2 --file data/py2.env # Run customized script at scripts/topic_coherence.sh # bash scripts/topic_coherence.sh topic_file corpus_dir save_prefix bash scripts/topic_coherence.sh data/topic-1.topics data/20news-clean/corpus/ data/topic-1.res
-
The codes dependent on
python 3.6 pytorch 1.0.0 sacred 0.7.0 scikit-learn 0.19.1
Run the model with different config files
export PYTHONPATH=`pwd`:
# any config of: nvdm.yaml, gsm.yaml, ntm.yaml, ntmr.yaml
python experiments/ntm.py -F data/exp/ntm with config_file=data/config/nvdm.yaml
# Use WETC topic coherence measure as in Coherence-Aware Neural Topic Modeling
python experiments/ntm.py -F data/exp/ntm with config_file=data/config/nvdm.yaml update.callback=callback_wetc
The test perplexity and topic coherence on best evaluation checkpoint when topic number is 50.
Model | PPL | Topic Coherence |
---|---|---|
GSM | 789.27 | 0.212 |
NTMR | 818.30 | 0.347 |
NTM | 883.71 | 0.281 |
NVDM | 769.50 | 0.158 |
Please let me known if better results are obtained or any advice on model and implementation details.
- The performance of NTMR is dependent on optimizer type.