Different LR between Moco, CMC and SimCLR in linear evaluation stage

Question

Different LR between Moco, CMC and SimCLR in linear evaluation stage

haohang96 opened this issue 4 years ago · 3 comments

Hi,

I wondered why the magnitude of lr in linear evaluation stage varies greatly？ I think SimCLR is a similar method as MoCo and CMC, which all use contrastive loss to train the unsupervised network. But the lr in linear evaluation stage of moco and cmc are 30, simclr is 0.1.

Refer to your previous answer in CMC project, add a non-parametes bn before fc in linear evaluation stage can adjust lr from 30 to a normal magnitude. If simclr use similar strategies? I check the code of official simclr, but I can not find such operations.

Thanks !

Answer 1 · 2020-06-20T17:39:20.000Z

@haohang96, it seems to me that the learning parameter in PyTorch and Tensorflow tend to be different, typically.

Another observation is that, even using the same tool, e.g., PyTorch, the parameters on different dataset are also different.

Answer 2 · 2020-06-20T18:04:29.000Z

I agree that different tools will lead a different learned parameters. But the lr difference between simclr and moco&cmc is quite big.

If it is caused by memory bank mechanism used in moco and cmc (although moco use a queue, we can think it as a simplified memory bank)？

Another difference between simclr and moco&cmc is there exists bn in projection head of simclr, may it cause the difference?

Answer 3 · 2020-06-20T18:28:26.000Z

Both reasons might contribute more or less to the difference. But I can not give you conclusion before any firm study is conducted.