How to deal with very small gradients?

I faced a situation where my model suddenly stopped training (weights were not being updated) after certain epochs. After digging a bit I realized it had to do with this line:

sls/sls/sls.py

Line 92 in e2522d5

if grad_norm >= 1e-8:

In most cases, I think this would make sense, but I am currently having a situation where my gradient norms are small for all batches, yet my validation loss is still very bad. When I rerun with a different seed I don't have this situation, which suggests I may have fallen into a very bad local minimum.

Would it be OK for me to remove the mentioned line? Or is there an important reason for this check to be in place?

It should be okay, 1e-8 as a threshold for grad_norm was chosen arbitrarily so smaller numbers would probably work as well.

If the gradients are too small and you would like to bring the step size back up, you could set reset_option=2 in the list of hyperparameters. This option will reset the step size to the initial value in every iteration before doing the line-search, which might help push the model to a better solution faster.

Good question, thanks for sharing!