arxyzan/data2vec-pytorch

EMA teacher model should not be deepcopied

sudhakaranjain opened this issue · 2 comments

EMA teacher model, according to the paper, is initialized randomly with the same architecture as student model. So, deepcopying the student model to create the teacher model should be avoided as it copies the weight parameters as well.

Sorry for the confusion. You are right!