Multiple training runs of same model with different random seed for weight initialisation
KarolisRam opened this issue · 1 comments
KarolisRam commented
Model internals can vary substantially when they are retrained using the same parameters and procedures except for the random seed for weight init. This is due to underspecification and was shown in https://arxiv.org/abs/2011.03395, NLP included. Should at least one of the Pythia models have weights for maybe 5-10 identical training runs, except for a different seed?
This could show how much variance there already is in some OOD tasks between these nearly identical Pythia models, compared to variance between different models. The paper above shows that this variance for BERT can be as large for different random seeds on same model as between different models.
KarolisRam commented