Gradient clipping

Question

kazewong opened this issue a year ago · 0 comments

It has been observed in multiple occasion where the loss can suddenly jump to a high value and essentially destroy whatever it was learned before.

Implementing gradient clipping should alleviate this problem.