RMSProp epsilon=1.0, why?

Question

RMSProp epsilon=1.0, why?

TimDettmers opened this issue 4 years ago · 0 comments

Thank you so much for this codebase. It helps a lot to make NAS more reproducible.

I have a question regarding RMSProp. I do not see RMSProp often in computer vision, but I guess it is fine, there are not the greatest difference between optimizers. However, I see that you used epsilon=1.0 which I find odd since this is the constants that usually prevent division by zero errors and you set it at a very high value. That high value introduces a systematic bias in the variance estimate. Do you have any references for other public results using this in conjunction with that high learning rate or is there any reason in particular why epsilon=1.0?