How come different performance?
tessavdheiden opened this issue · 5 comments
Hi, but when I ran this code, the moving average reward is always below -1000 for the continuous situation, do you know what kind of problem it could be? (the 'UPDATE_GLOBAL_ITER' has already been set to 10) The performance of the discrete situation is very bad as well.
Hi,
Here is another trial.
Try 'torch.nn.utils.clip_grad_norm_(lnet.parameters(), 20)' in utils.py
It helped me to reduce performance differences.
Hi,I meet a trouble when I train another A3C.
After some time, all the networks always output the same action.
I tried the "torch.nn.utils.clip_grad_norm_(lnet.parameters(), 20)", it doesn't work.
It may be that during the training process, the network tries many times, but does not reap the reward.
Do you have any ideas about this problem?