This repository hosts the improved VEDs(Variational Encoder-Decoders) model for generative dialog modeling as described by Shen and Su et al. 2018
- Download the DailyDialog Corpus as released by Li, Su and Shen et al. (2017) which can be found : http://yanran.li/dailydialog.html
- Create the dictionary from the corpus and Serialize the dicitonary and corpus.(we give a demo convert_text2dict.py for creating pkl file)
- Download Word2Vec trained by GoogleNes: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM.
- Changing dataproducer.py to generate tfrecord from the serialized corpus text(We use TFRecord for fast and stable training process)
We use Tensorflow1.0 and Python2.7 for convenient.
- Create a new "Checkpoints" directories inside it.
- Change the parameters in main.py according to your GPU memeory size.
- Read the core code file WS_vhred.py
- Change and Run main.py
Improving Variational Encoder-Decoders in Dialogue Generation. Xiaoyu Shen, Hui Su, Shuzi Niu, Vera Demberg. 2018. AAAI https://arxiv.org/abs/1802.02032
The DailyDialog Corpus: DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu. 2017. IJCNLP. https://arxiv.org/abs/1710.03957.