cortwave/cdiscount-kaggle

Nadam optimizer

Closed this issue · 1 comments

As we see good results with bigger batch, adding Nesterov momentum should work in a similar way

approximately same results as with Adam