JRC1995/DemonRangerOptimizer

What do you think about diffgrad?

hadaev8 opened this issue · 3 comments

This is a greate repo, my respect.

Thanks.

I haven't read the paper on diffgrad. Abstract looks interesting.

Also, what do you think about weight decay in style of adamw and gradient norming? Will I break anything with gradient norming?

It should be good.