lucidrains/byol-pytorch

Differences between BYOL and SimSiam

SilverUnicorn opened this issue · 2 comments

Thanks for your implementation of BYOL and SimSiam.
However, after reading those two papers, especially the implementation part they wrote, I found there are also some other differences between those two structures, like MLP structure(same in BYOL in projection and prediction but not same in SimSiam), weight decay(applied in different part) and the uses of optimizer(LARS in BYOL and SGD in SimSiam)

@SilverUnicorn hmm, bar the optimizer (I'm not convinced they make a huge difference), what are the differences between the two mlp structures that you see? could you expand? perhaps you missed the logic here? https://github.com/lucidrains/byol-pytorch/blob/master/byol_pytorch/byol_pytorch.py#L237

yeah, i pretty sure there are not huge differences between those two structures, i will expand that out if possible.