question about learning rate
homelifes opened this issue · 2 comments
hi @leftthomas
I have 2 questions below:
1- May i know the number of pre-training epochs that you did? Is it 1k? I understand that he number of linear evaluation epochs are 100, but what about the number pre-training epochs?
2- I've seen that you used constant learning rate of 1e-3 without any decay. I really appreciate the fact that you want to keep everything simple and avoid the warmup, LARS...etc, but shouldn't you at least decay the lr at some time? Have you tried this?
@homelifes for 1
, you should look readme
carefully, results
section has stated that, and for 2
, Adam
really not need learning rate decay
in practice, you can try that to see there are basically no any difference.
Thanks for your reply. For 1, I see that you are pre-trained for 500 epochs. I wasn't aware of your code which uses KNN to evaluate the pre-training. Thanks!