Achieving reported training performance
hfeniser opened this issue · 9 comments
We trained a few agents with the training code provided in repo. If don't change anything the mean reward in 500 training levels in Starpilot game is around 5.6
. If we remove the VecNormalize
line in environment creation, then we achieve mean reward around 9.2
. However the reported mean reward is around 12
in the paper (Figure 4).
Did you use VecNormalize
in the paper?
What command are you using to run the experiments? #6 mentions using too small of a batch size, for instance.
Yes that might be the reason. We were using the following command:
python -m train_procgen.train --env_name starpilot --num_levels 500
As we don't have 4 GPUs we are not able to do mpiexec -np 4 ...
. Therefore we will try to simulate this by running 256 environments instead of 64 in parallel.
@hasanferit Did you change timesteps_per_proc
from 50_000_000
to 200_000_000
?
@hasanferit ok, maybe it has something to do with MPI. FYI, I am able to reproduce the results with 2 workers using 128 envs.
Oh good to hear you are able to reproduce. Currently, we are running 1 worker with 256 env, hopefully that will work.
By the way, I still don't get the benefit of VecNormalize
. Does it help achieving better training performance? If somebody creates environment via Gym then results are not reproducible as rewards would not be normalized in this case.
I think the point was better training performance, though the effect size seems small. If you create environments with gym, then you'd have to also reproduce the exact same training setup to get the same results, which does include reward normalization.
We achieved 13.9 reward in starpilot game by running 1 worker with 256 env with VecNormalize. Therefore I am closing this issue.
We are able to achieve up to 18.7 mean reward in starpilot game by introducing 8 initial states.
EDIT: We achieved 18.7 at 400M training with 8 initial states. In 200M timesteps the best we achieved is 14.9 and the average is 14.16. Sorry for the inconvenience.