juntang-zhuang/Adabelief-Optimizer

Changing init learning rate

Kraut-Inferences opened this issue · 2 comments

Does modifying the initial learning rate hurt the algorithm in any way? Wanting to use exponential decay but don't know if it would improve the performance.

From my experience with a ViT model on ImageNet, AdaBelief improves over Adam when both use a default cosine learning rate. I think it should work with other models.

thank you.