/Ubuntu-Multiresolution-Tools

Dialogue corpus creation and evaluation scripts for the Ubuntu Dialogue Corpus.

Primary LanguagePython

Description

This repository hosts the files and scripts used for generative dialogue modeling experiments described in the paper:

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation. Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville. 2016. http://arxiv.org/abs/1606.00776.

The repository includes scripts to recreate and modify the noun and activity-entity representations, and to evaluate dialogue response generation on the Ubuntu Dialogue Corpus.

See README files inside each directory for further information.

The pre-processed Ubuntu Dialogue Corpus, noun representations, activity-entity representations and model responses are available at: http://www.iulianserban.com/Files/UbuntuDialogueCorpus.zip.

The original Ubuntu Dialogue Corpus as released by Lowe et al. (2015) can be found here: http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/

Unfortunately due to Twitter's terms of service we are not allowed to distribute Twitter content. Therefore we can only make available the tweet IDs, which can then be used with the Twitter API to build a similar dataset. The tweet ISs can be found here: http://www.iulianserban.com/Files/TweetIDs.zip. The tweet IDs together with model responses on the test set can be found here: www.iulianserban.com/Files/UbuntuDialogueCorpus.zip.

Citation

If you use these tools for your work, we'd really appreciate it if you could cite our paper.

Other References

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau. 2015. SIGDIAL. http://arxiv.org/abs/1506.08909.