/Neural-Document-Modeling

PyTorch implementations of NVDM, GSM, NTM, NTMR

Primary LanguagePython

Neural-Document-Modeling

Simple PyTorch implementation of models

References to other related implementations:

Dependencies

  1. To evaluate topic coherence, the topic interpretationbility toolkit should be used which was already downloaded into ./scripts directory. To evaluate topic coherence, follow the steps below.

    # Setup a Python 2 environment named py2 using conda package manager.
    conda create -n py2 --file data/py2.env
    # Run customized script at scripts/topic_coherence.sh
    # bash scripts/topic_coherence.sh topic_file corpus_dir save_prefix
    bash scripts/topic_coherence.sh data/topic-1.topics data/20news-clean/corpus/ data/topic-1.res
  2. The codes dependent on

    python 3.6
    pytorch 1.0.0
    sacred 0.7.0
    scikit-learn 0.19.1

Models

Run the model with different config files

export PYTHONPATH=`pwd`:
# any config of: nvdm.yaml, gsm.yaml, ntm.yaml, ntmr.yaml
python experiments/ntm.py -F data/exp/ntm with config_file=data/config/nvdm.yaml 
# Use WETC topic coherence measure as in Coherence-Aware Neural Topic Modeling
python experiments/ntm.py -F data/exp/ntm with config_file=data/config/nvdm.yaml update.callback=callback_wetc

Results

The test perplexity and topic coherence on best evaluation checkpoint when topic number is 50.

Model PPL Topic Coherence
GSM 789.27 0.212
NTMR 818.30 0.347
NTM 883.71 0.281
NVDM 769.50 0.158

Please let me known if better results are obtained or any advice on model and implementation details.

Notes

  • The performance of NTMR is dependent on optimizer type.