what datasets do you use for the model ？

Question

what datasets do you use for the model ？

Opened this issue 9 years ago · 0 comments

I used the Opensubtitles2013 datasets to train the model，and I found that the answers uttered by the bot is not logical throughly.
For example
when I said : Hello,what is your name?
It said:ActuallybuhcoulisdispensersdodgersarcsvaultergaiAmaravatiAmaravatiAmaravatiAmaravatiAmaravatigaiAmaravatiZammahZammahZammahZammahAmaravatiZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahAmaravatiAmaravatiAmaravatiAmaravatiAmaravatiAmaravatiZammahZammahZammahAmaravatiZammahAmaravatiZammahAmaravatiZammahAmaravatiZammahAmaravati
: (
So, I want to know what datasets you used. If you are convenient ,please tell me.
My vocab_size is 10K,and the input_data have 80M sentences.