Question about expected results

Question

richarddwang opened this issue 4 years ago · 1 comments

How long did you train ELECTRA-Small OWT
In the expected result section of READEME.md, you have mentioned "OWT is the OpenWebText-trained model from above (it performs a bit worse than ELECTRA-Small due to being trained for less time and on a smaller dataset)". How may steps have you trained ? And AFAIK openwebtext should be larger than wikibook, is that mean you use only part of the data ?
How come the scores in expected results
You have also mentioned "The below scores show median performance over a large number of random seeds.", is that mean the scores listed in that section is the scores of models pretrained from scractch with random seeds and each model was finetuned for 10 runs with random seeds, or is one pretrained model and finetuned for 10 runs with many random seeds ?
Did you use double_unordered in training models for expected results ?

Answer 1 · 2020-10-06T06:28:13.000Z

Below is the original Kevin's reply to my email.

It is was trained for 1 million steps. I'm actually not sure how many epochs over the dataset it does, but the (public) OWT dataset is only about 50% bigger than Wikibooks I believe.
They are from the same pre-trained checkpoint with different random seeds for fine-tuning. The number of runs was at least 10, but much more (I think 100) for some tasks; I left the eval jobs running for a while and took the median of all the results.
Yes