AdamP Optimizer — Unofficial TensorFlow Implementation
"Slowing Down the Weight Norm Increase in Momentum-based Optimizers"
Junho Kim
Implemented by[Paper] [Project page] [Official Pytorch]
Validation
I have checked that the code is working, but I couldn't confirm if the performance is the same as the offical code.
Usage
Usage is exactly same as tf.keras.optimizers library!
from adamp_tf import AdamP
from sgdp_tf import SGDP
optimizer_adamp = AdamP(learning_rate=0.001, beta_1=0.9, beta_2=0.999, weight_decay=1e-2)
optimizer_sgdp = SGDP(learning_rate=0.1, weight_decay=1e-5, momentum=0.9, nesterov=True)
- Do not use with
tf.nn.scale_regularization_loss
. Use theweight_decay
argument.
Arguments
SGDP
and AdamP
share arguments with tf.keras.optimizers.SGD and tf.keras.optimizers.Adam.
There are two additional hyperparameters; we recommend using the default values.
delta
: threhold that determines whether a set of parameters is scale invariant or not (default: 0.1)wd_ratio
: relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)
Both SGDP
and AdamP
support Nesterov momentum.
nesterov
: enables Nesterov momentum (default: False)
How to cite
@article{heo2020adamp,
title={Slowing Down the Weight Norm Increase in Momentum-based Optimizers},
author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Uh, Youngjung and Ha, Jung-Woo},
year={2020},
journal={arXiv preprint arXiv:2006.08217},
}