This contains the data for "daily dialog" and "self dialog" training data We only care about self dialog data.
git submodule add https://github.com/jfainberg/self_dialogue_corpus
Reads data/selfdialog/dialogues/*.txt and exports:
- sd-gpt2-input.csv: all training data without explicit topics
- sd-gpt-input-with-topic.csv: all training data with explicit topics
You can use these files to retrain the gpt-2 model using https://github.com/minimaxir/gpt-2-simple