Optimizer experimentation

Question

Optimizer experimentation

Opened this issue 2 years ago · 0 comments

KataGo/AlphaGo use Stochastic Gradient Descent (SGD). But Adam is all the rage these days.

@Rediness tells me that Adam converges to a good solution much more quickly, although if run forever, SGD eventually attains greater accuracy than Adam.

This suggests that perhaps we can get the best of both worlds by starting with Adam and switching to SGD. We should experiment with this. If this idea works, we want to figure out a good way to make such scheduling decisions automatically.