awjuliani/DeepRL-Agents

A3C Doom: Basic scenario: How to select clipping?

Closed this issue · 5 comments

Why 40.0?

grads,self.grad_norms = tf.clip_by_global_norm(self.gradients,40.0)

Hi Ibrahim,

This is something that should be adjusted based on the performance findings of your own task. 40 is what was used in the OpenAI starter agent, so I used that here, as it led to convergence for the example task.

Would it be wise to use the Grad Norm plot from this projects Tensorboard to select a value that is close to the mean-max of the smoothed chart to prevent overly large updates? Along those lines, is it worth considering an adaptive gradient norm clip derived from a moving average of the norm?

Thank you @DMTSource for elaboration

Could you please give numerical examples? (for better understanding)

If Grad Norm is around 25, then we should set clipping = 25?
When Grad Norm is decrease by time (say 20), then we should set clipping = 20?

Yes that is what I am thinking. Looking at your png here, my question above asks if setting the clip value to between ~15-20 is appropriate. Or in a dynamic sense, to recreate the smoothed values seen in Tensorboard(careful they are smoothed values of smoothed values) and use n_std*std_of_norms_rolling + mean_of_norms_rolling to determine an upper bound that does not throw away information.

But back to my above question for @awjuliani : is visual inspection of the Grad Norm plot even a valid way to determine the grad norm clip value or will the gradient updates change magnitude with each hyperparamter update?

I hope this would solve the gathering health scenario (till now I failed to make it converge)

waiting for @awjuliani ...