Reccomendation for the `eps_root` setting for differentiating through the Adam optimizer
itk22 opened this issue · 2 comments
itk22 commented
Dear Optax team,
I am working on implementing Model-Agnostic Meta-Learning in my project, and I noticed that setting the inner loop optimizer to the default Adam optimizer in Optax results in nan values in the meta-gradients. This is covered well in the documentation, which mentions that the eps_root
should be set to a small constant to avoid dividing by zero when rescaling. Could you please recommend a good default value for eps_root
in a meta-learning scenario?
holounic commented
Hi Igor, thanks for reaching out! 1e-8 might be a suitable choice for this case.
itk22 commented
Thank you, I will keep on using this for my experiments then :)