About the stopping criteria

Question

About the stopping criteria

Punchwes opened this issue 5 years ago · 1 comments

I am curious about the stopping criteria for training you used in the paper. Is it the same as in the code by depending on the average score of all sorts of word similarity/analogy/categorisation tasks? Because I found that using the average score to save best model brought so much stochastics. Though the final average scores are similar for multiple runs, the score for specific tasks could have huge differences. Besides, do you think it is plausible to use those intrinsic tasks to decide the best model during training given the fact that you would evaluate your model on those tasks for comparison with other models?

Best,
Qiwei

Answer 1 · 2019-11-27T15:26:19.000Z

Hi Qiwei,
If you go through our readme then you can find the validation and test split of the dataset for different intrinsic tasks that we utilize for deciding our best model (link). We train the model until the validation performance stops improving on further training the model.

Let me know if you any other clarification.