Significant overfit on default hyperparameters on cornell-movie-dialogs config
lk251 opened this issue · 2 comments
Running:
python main.py --config cornell-movie-dialogs --mode train
to the end (100000 steps) will result in a training loss of about 2.6, test loss of 8.4.
Which hyperparameters did you use? The resulting chatbot doesn't work very well (the one in your readme is a lot better).
Thank you!
Seq2Seq model has a problem that there is a gap between training and inference.
You can check practical tips for training sequence-to-sequence models with attention in this blog
I don't remember the model's detail because I worked on it before.
I guess, The result in readme is more overfitting than yours. (200,000 steps)
And cornell-movie-dialogs is too small to train conversation model.
Lack of data means that overfitting is inevitable.
Thanks for the advice!