gd-zhang opened this issue 6 years ago · 1 comments
In summary, I can only find reward for each environment. So I am supposed to average over all envs?
Yes, if you average over all environments, you'll get the average training performance.
However, it's done automatically in Tensorboard using the smoothing factor.