other sources?
dcsan opened this issue · 0 comments
I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot.
But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
-
Cornell's convokit
provides an API onto some really good sets like the famous movie dialogue corpus and also a structured API for some subreddits
https://convokit.cornell.edu/ -
Facebook's Parl.ai
has a standardized API to lots of datasets
https://parl.ai/about/
eg. https://arxiv.org/pdf/1801.07243.pdf -
tatoeba
has a good sentence database but no conversation turns
https://tatoeba.org/eng/
I'm keeping archives of a few things I find. Here are a bunch of logs for teach English conversation
https://github.com/dcsan/corpus/blob/master/convo/esl-china/esl06.csv
some of which could be converted for use here.
What other sources have people found for conversations?