This is the repo for the paper "Grounding Conversations with Improvised Dialogues" (ACL2020). SPOLIN is a collection of more than 68,000 "Yes, and" type dialogue pairs extracted from the Spontaneanation podcast by Paul F. Tompkins, the Cornell Movie-Dialogs Corpus, and the SubTle corpus.
The core dataset that was used for the experiments in the paper only includes yesands and non-yesands from Spontaneanation and most of what is provided in those extracted from the Cornell Movie-Dialogs Corpus. After the submitting the paper, we continued our iterative data augmentation process, repeating another iteration with the Cornell Movie-Dialogs Corpus and extracting from the SubTle corpus. This expanded version is also included in this repository here. This latest version of SPOLIN was used to train the model used in our demo.
In the data
folder, we provide two versions of the SPOLIN training set:
- Version used for experiments in the ACL paper:
data/spolin-train-acl.json
- Expanded version:
data/spolin-train.json
We will release publicly available download links to our classifier and fine-tuned GPT-2 models soon.
Project page: https://justin-cho.com/spolin Demo: https://spolin.isi.edu Paper: Link coming soon
yesands | non-yesands | |
---|---|---|
Spontaneanation | 10,959 | 6,087* |
Cornell | 16,926 | 18,810 |
SubTle | 40,303 | 19,512 |
Total | 68,188 | 44,409 |
*Artificially collected by mix & matching positive Spontaneanation samples to balance dataset for training classifier
data/spolin-train.json | data/spolin-valid.json | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
If you use data or code in this repository, please cite our ACL2020 paper:
@inproceedings{cho2020spolin,
title={Grounding Conversations with Improvised Dialogues},
author={Cho, Hyundong and May, Jonathan},
booktitle ={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.