Warmup causes NAN

Question

Warmup causes NAN

lyw615 opened this issue 5 years ago · 1 comments

I use RAdam in maskrcnn with keras implement. But after warmup completed, the loss value get NAN. If SGD used, without warmup, the loss value is normal. I just do the operation during heads layers training. Usage is as following, and any reply will be appreciated: learning_rate is 0.001.

if warmup:
   optimizer_use=RAdam(learning_rate=1e-5,total_steps=all_steps, 
 warmup_proportion=0.05,min_lr=learning_rate)`
           
else:
           
   optimizer_use=RAdam()

self.keras_model.compile(optimizer=optimizer_use, loss=[ None] * len(self.keras_model.outputs))

Answer 1 · 2020-03-24T10:14:55.000Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.