how to train DINO with MSE loss.
Backdrop9019 opened this issue · 0 comments
Backdrop9019 commented
Could you please share the specific settings you used when training DINO with MSE and how the results compare to those of BYOL?
In the case of BYOL, MSE is performed on the L2-normalized output. How was DINO trained?
In my experience, if I train with MSE without L2-normalization, collapse occurs