After few step Loss explodes/vanishes and than in the later epochs it produces 'Nan' as loss output.

Question

After few step Loss explodes/vanishes and than in the later epochs it produces 'Nan' as loss output.

cjt222 opened this issue 4 years ago · 17 comments

我应用到crnn去做ocr，就会出现这个问题，调整学习率也没效果

Answer 1 · 2020-05-20T06:29:49.000Z

Problem: After few step Loss explodes and than in the later epochs it produces 'Nan' as loss output.
It is suffering from exploding gradients problem!
Solution: Gradient Clipping!

Answer 2 · 2020-05-20T07:02:56.000Z

Try this and tell me if the problem is reoccurring

Gradient Clipping:

Apply: tf.clip_by_value(clipping_variable,1e-10,1.0)
logits=tf.clip_by_value(logits,1e-10,1.0)

Answer 3 · 2020-05-20T07:25:17.000Z

Thanks for your help, loss explodes problem is solved after add Gradient Clipping, but loss may not converge

Answer 4 · 2020-05-20T07:32:08.000Z

Try different Learning rates and alpha, gamma values

Answer 5 · 2020-05-20T08:08:56.000Z

I change some learning rates and alpha, gamma values and it not work. And i print p and ctc_loss ,p is always 0 and ctc_loss is always same, but when i set gamma to 0, ctc_loss becomes normal.

Answer 6 · 2020-05-20T08:16:57.000Z

Try:
logits=tf.clip_by_value(logits,1e-7,1.0-1e-7)
or clipping is required near power function in the line where it calculates gamma

Answer 7 · 2020-05-20T09:14:30.000Z

maybe clip as below is ok，ctc loss is not always same, but it is still not converge

Answer 8 · 2020-05-20T09:49:44.000Z

What about p values? Are they still zero?
If they are still zero then try different function to calculate exp().

Answer 9 · 2020-05-20T10:02:39.000Z

most of it is zeros and some of it may be 1e-37,i have try tf.math.exp() instead of tf.exp, result is same and not converge

Answer 10 · 2020-05-20T10:17:27.000Z

Upper Limit of clipping would be an issue.
As the values are very small in p there might be issue in upper limit of clipping.
Try printing values of p without clipping.
Then deduce which range of values would be good for clipping.
Once you get values of p in some appropriate range (and not 0 or very small number) after applying clipping, it should converge.

Answer 11 · 2020-05-21T03:20:55.000Z

In fact, loss becomes nan when p values is still zeros without clipping, so I can not get values of appropriate range

Answer 12 · 2020-05-21T03:45:22.000Z

Now, I try to train only with ctc loss unitl converage and fintinue with focal loss, and i will update while i get result

Answer 13 · 2020-05-21T04:36:22.000Z

Also keep a check on ctc_loss output range.

Answer 14 · 2020-05-21T07:00:45.000Z

Good idea! I try to clip ctc loss and not clip gradient, loss may converge now

Answer 15 · 2020-05-21T08:09:01.000Z

Final Solution:
Clip ctc_loss() instead of gradients.
I am closing this issue now!

Answer 16 · 2020-07-01T12:17:55.000Z

@cjt222 hi,does FocalCtcLoss improve your CRNN accuracy?

Answer 17 · 2020-07-03T04:12:56.000Z

@cjt222 hi,does FocalCtcLoss improve your CRNN accuracy?

I do not impore CRNN accuracy with FocalCtcLoss ,I do not know if I can not adjust a better alpha and gamma