what datasets do you use for the model ?
Opened this issue · 0 comments
yrd241 commented
I used the Opensubtitles2013 datasets to train the model,and I found that the answers uttered by the bot is not logical throughly.
For example
when I said : Hello,what is your name?
It said:ActuallybuhcoulisdispensersdodgersarcsvaultergaiAmaravatiAmaravatiAmaravatiAmaravatiAmaravatigaiAmaravatiZammahZammahZammahZammahAmaravatiZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahZammahAmaravatiAmaravatiAmaravatiAmaravatiAmaravatiAmaravatiZammahZammahZammahAmaravatiZammahAmaravatiZammahAmaravatiZammahAmaravatiZammahAmaravati
: (
So, I want to know what datasets you used. If you are convenient ,please tell me.
My vocab_size is 10K,and the input_data have 80M sentences.