athena-team/athena

Accuracy drop abnormal

kingfener opened this issue · 3 comments

Hi ,
I try to train asr/timit on my pc , though the train haven't finished yet , but I find that since ":global_steps: 7231 "[ see log attached], accuracy drop sharply, and CTCAccuracy drop to 0 and don't change any more.

So, is this normal ? or may be something wrong ?

Thanks a lot.

############## ############## ############## ############## ############## ##############
part of logs
############## ############## ############## ############## ############## ##############

INFO:absl:global_steps: 7161 learning_rate: 7.3857e-04 loss: 24.4343 Accuracy: 0.8241 CTCAccuracy: 0.7706 sec/iter: 1.9227
INFO:absl:global_steps: 7171 learning_rate: 7.3806e-04 loss: 16.2272 Accuracy: 0.8480 CTCAccuracy: 0.7895 sec/iter: 1.7454
INFO:absl:global_steps: 7181 learning_rate: 7.3754e-04 loss: 41.2167 Accuracy: 0.8339 CTCAccuracy: 0.7677 sec/iter: 1.6163
INFO:absl:global_steps: 7191 learning_rate: 7.3703e-04 loss: 28.8470 Accuracy: 0.8332 CTCAccuracy: 0.7650 sec/iter: 1.7445
INFO:absl:global_steps: 7201 learning_rate: 7.3652e-04 loss: 25.4526 Accuracy: 0.7835 CTCAccuracy: 0.6967 sec/iter: 2.0726
INFO:absl:global_steps: 7211 learning_rate: 7.3601e-04 loss: 22.2399 Accuracy: 0.8127 CTCAccuracy: 0.7281 sec/iter: 1.7229
INFO:absl:global_steps: 7221 learning_rate: 7.3550e-04 loss: 63.7613 Accuracy: 0.7681 CTCAccuracy: 0.7023 sec/iter: 1.7663
INFO:absl:global_steps: 7231 learning_rate: 7.3499e-04 loss: 182.4591 Accuracy: -0.0079 CTCAccuracy: 0.0549 sec/iter: 2.1732
INFO:absl:global_steps: 7241 learning_rate: 7.3448e-04 loss: 189.0946 Accuracy: 0.1835 CTCAccuracy: 0.0313 sec/iter: 2.1511
INFO:absl:global_steps: 7251 learning_rate: 7.3397e-04 loss: 109.6702 Accuracy: 0.2176 CTCAccuracy: 0.0116 sec/iter: 1.8667
INFO:absl:global_steps: 7261 learning_rate: 7.3347e-04 loss: 126.3981 Accuracy: 0.2350 CTCAccuracy: 0.0041 sec/iter: 2.1367
INFO:absl:global_steps: 7271 learning_rate: 7.3296e-04 loss: 114.8526 Accuracy: 0.2516 CTCAccuracy: 0.0016 sec/iter: 1.8951
INFO:absl:global_steps: 7281 learning_rate: 7.3246e-04 loss: 146.5510 Accuracy: 0.2488 CTCAccuracy: 0.0013 sec/iter: 1.7001

INFO:absl:global_steps: 10561   learning_rate: 6.0817e-04       loss: 148.2634  Accuracy: 0.5695        CTCAccuracy: 0.0000     sec/iter: 2.1442
INFO:absl:global_steps: 10571   learning_rate: 6.0789e-04       loss: 70.1164   Accuracy: 0.6322        CTCAccuracy: 0.0000     sec/iter: 1.7343
INFO:absl:global_steps: 10581   learning_rate: 6.0760e-04       loss: 93.0921   Accuracy: 0.5675        CTCAccuracy: 0.0000     sec/iter: 1.9228
INFO:absl:global_steps: 10591   learning_rate: 6.0731e-04       loss: 111.8232  Accuracy: 0.5899        CTCAccuracy: 0.0000     sec/iter: 1.7242
INFO:absl:global_steps: 10601   learning_rate: 6.0703e-04       loss: 78.0932   Accuracy: 0.5999        CTCAccuracy: 0.0000     sec/iter: 1.6350
INFO:absl:global_steps: 10611   learning_rate: 6.0674e-04       loss: 85.0215   Accuracy: 0.5680        CTCAccuracy: 0.0000     sec/iter: 1.8293
INFO:absl:global_steps: 10621   learning_rate: 6.0645e-04       loss: 59.1028   Accuracy: 0.6001        CTCAccuracy: 0.0000     sec/iter: 1.6727


##############  ##############  ##############  ##############  ##############  ##############  
##############  ##############  ##############  ##############  ##############  ##############  







CTC training can be unstable and it may fall into a plateau that's hard to recover from. It seldom happens and my best suggestion is to just rerun the whole thing or start from a checkpoint where the convergence is still normal.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue is closed. You can also re-open it if needed.