Achieving reported training performance

Question

Achieving reported training performance

hfeniser opened this issue 5 years ago · 9 comments

We trained a few agents with the training code provided in repo. If don't change anything the mean reward in 500 training levels in Starpilot game is around 5.6. If we remove the VecNormalize line in environment creation, then we achieve mean reward around 9.2. However the reported mean reward is around 12 in the paper (Figure 4).

Did you use VecNormalize in the paper?

Answer 1 · 2020-05-18T19:23:09.000Z

What command are you using to run the experiments? #6 mentions using too small of a batch size, for instance.

Answer 2 · 2020-05-19T07:19:39.000Z

Yes that might be the reason. We were using the following command:
python -m train_procgen.train --env_name starpilot --num_levels 500

As we don't have 4 GPUs we are not able to do mpiexec -np 4 .... Therefore we will try to simulate this by running 256 environments instead of 64 in parallel.

Answer 3 · 2020-05-19T16:38:21.000Z

@hasanferit Did you change timesteps_per_proc from 50_000_000 to 200_000_000?

Answer 4 · 2020-05-19T16:43:48.000Z

@kaixin96 Yes, I changed, that is not the problem.

Answer 5 · 2020-05-19T16:49:39.000Z

@hasanferit ok, maybe it has something to do with MPI. FYI, I am able to reproduce the results with 2 workers using 128 envs.

Answer 6 · 2020-05-19T16:55:55.000Z

Oh good to hear you are able to reproduce. Currently, we are running 1 worker with 256 env, hopefully that will work.

By the way, I still don't get the benefit of VecNormalize. Does it help achieving better training performance? If somebody creates environment via Gym then results are not reproducible as rewards would not be normalized in this case.

Answer 7 · 2020-05-19T17:59:59.000Z

I think the point was better training performance, though the effect size seems small. If you create environments with gym, then you'd have to also reproduce the exact same training setup to get the same results, which does include reward normalization.

Answer 8 · 2020-05-20T07:40:07.000Z

We achieved 13.9 reward in starpilot game by running 1 worker with 256 env with VecNormalize. Therefore I am closing this issue.

Answer 9 · 2020-06-02T23:00:59.000Z

We are able to achieve up to 18.7 mean reward in starpilot game by introducing 8 initial states.

EDIT: We achieved 18.7 at 400M training with 8 initial states. In 200M timesteps the best we achieved is 14.9 and the average is 14.16. Sorry for the inconvenience.