Reccomendation for the `eps_root` setting for differentiating through the Adam optimizer

Question

Reccomendation for the `eps_root` setting for differentiating through the Adam optimizer

itk22 opened this issue 6 months ago · 2 comments

Dear Optax team,
I am working on implementing Model-Agnostic Meta-Learning in my project, and I noticed that setting the inner loop optimizer to the default Adam optimizer in Optax results in nan values in the meta-gradients. This is covered well in the documentation, which mentions that the eps_root should be set to a small constant to avoid dividing by zero when rescaling. Could you please recommend a good default value for eps_root in a meta-learning scenario?

Answer 1 · 2024-03-28T16:14:48.000Z

Hi Igor, thanks for reaching out! 1e-8 might be a suitable choice for this case.

Answer 2 · 2024-03-28T16:21:48.000Z

Thank you, I will keep on using this for my experiments then :)