About Loss function.
Opened this issue · 1 comments
AndrewTalYeah commented
In the paper, use L(zt, zs) = l(zt, qs) + l(zs, qt) for training, if only use L=l(zt, qt) for training, that is, one branch of the two data augmentations is removed. Do you think this can train normally?
GSusan commented
您好!我已收到邮件,会尽快回复。