An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of pupolar tasks in the field of CV, NLP, and etc.
Based on Luo et al. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.
AdaBound requires Python 3.6.0 or later. We currently provide PyTorch version and AdaBound for TensorFlow is coming soon.
The preferred way to install AdaBound is via pip
with a virtual environment.
Just run
pip install adabound
in your Python environment and you are ready to go!
As AdaBound is a Python class with only 100+ lines, an alternative way is directly downloading adabound.py and copying it to your project.
You can use AdaBound just like any other PyTorch optimizers.
optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1)
As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of
training, and gradually transforms to SGD at the end.
The final_lr
parameter indicates AdaBound would transforms to an SGD with this learning rate.
For most cases, you can just use the default hyperparameter final_lr=0.1
without tuning it.
The performance is very robust regardless the value of final_lr
.
See Appendix G of the paper for more details.
If you use AdaBound in your research, please cite Adaptive Gradient Methods with Dynamic Bound of Learning Rate.
@inproceedings{Luo2019AdaBound,
author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
booktitle = {Proceedings of the 7th International Conference on Learning Representations},
month = {May},
year = {2019},
address = {New Orleans, Louisiana}
}