This is a TensorFlow implementation of the model described in:
Jianxiong Dong, Jim Huang Enhance Word Representation For Out-of-Vocabulary on Ubuntu Dialogue Corpus.
The model has acheived the state-of-the-art performane on Ubuntu Dialogue Corpus V2 and Douban Chinese dialogue corpus.
Code author: Jianxiong Dong
- Install the Tensorflow library (instructions). For example:
virtualenv --system-site-packages tensorfow_dev
source tensorflow_dev/bin/activate
pip install --upgrade pip
pip install tensorflow-gpu==1.4.0
- 16GB of RAM. 32GB is recommended.
- A machine with NVIDIA GPU card (large GPU RAM) is preferable. It has been tested with NVIDIA Titan Xp (12G).
We used Ubuntu Dialogue Corpus V2. In order to easily reproduce results in the above paper, the processed dataset has been provided.
cd data
sh download.sh
Execute the following commands to start the training script. By default it will run for 230k steps to achieve maximum mean reciprocal rank on the validation set.
cd bin
nohup sh ubuntu_train.sh &
If several runs exist in 'runs' folder, the checkpoints of the latest run is used to evaluate the model performance.
cd bin
sh ubuntu_test.sh