EMA teacher model should not be deepcopied

Question

EMA teacher model should not be deepcopied

sudhakaranjain opened this issue 2 years ago · 2 comments

EMA teacher model, according to the paper, is initialized randomly with the same architecture as student model. So, deepcopying the student model to create the teacher model should be avoided as it copies the weight parameters as well.

Answer 1 · 2022-10-12T17:13:49.000Z

Hi @sudhakaranjain, I'm not sure if I understand your issue, but according to the official implementation the EMA must be the deepcopy of the student.
https://github.com/facebookresearch/fairseq/blob/16538a0bff1b9f32e89aa915f2e8b57193f33109/examples/data2vec/models/data2vec_text.py#L346
https://github.com/facebookresearch/fairseq/blob/16538a0bff1b9f32e89aa915f2e8b57193f33109/fairseq/modules/ema_module.py#L41

Answer 2 · 2022-10-12T17:33:51.000Z

Sorry for the confusion. You are right!