Changing init learning rate
Kraut-Inferences opened this issue · 2 comments
Kraut-Inferences commented
Does modifying the initial learning rate hurt the algorithm in any way? Wanting to use exponential decay but don't know if it would improve the performance.
juntang-zhuang commented
From my experience with a ViT model on ImageNet, AdaBelief improves over Adam when both use a default cosine learning rate. I think it should work with other models.
Kraut-Inferences commented
thank you.