Multiple training runs of same model with different random seed for weight initialisation

Question

Multiple training runs of same model with different random seed for weight initialisation

KarolisRam opened this issue 2 years ago · 1 comments

Model internals can vary substantially when they are retrained using the same parameters and procedures except for the random seed for weight init. This is due to underspecification and was shown in https://arxiv.org/abs/2011.03395, NLP included. Should at least one of the Pythia models have weights for maybe 5-10 identical training runs, except for a different seed?
This could show how much variance there already is in some OOD tasks between these nearly identical Pythia models, compared to variance between different models. The paper above shows that this variance for BERT can be as large for different random seeds on same model as between different models.

Answer 1 · 2023-06-22T09:37:07.000Z

already addressed in the appendix, my bad.