facebookresearch/swav

About Loss function.

Opened this issue · 1 comments

In the paper, use L(zt, zs) = l(zt, qs) + l(zs, qt) for training, if only use L=l(zt, qt) for training, that is, one branch of the two data augmentations is removed. Do you think this can train normally?

GSusan commented