nicklashansen/policy-adaptation-during-deployment

Questions on implementation details

Closed this issue · 1 comments

Hi, thanks for your fantastic work and for making your code open. I face a few technical questions when reproducing your results and would appreciate any advice from you.

  1. RAM usage is incredibly high. We see this simply by running the off-the-shelf script scripts/train.sh. The cartpole, swingup experiment consumes up to 80GB of RAM at the late training stage (close to 0.5M iterations). I wonder if this is expected or there is anything wrong with our computational environment.
  2. We only managed to finish the training with few seeds, and indeed we find a large variance and significant deviation from the reported value in the paper. Could you please specify which exact random seeds were used to obtain the claimed results?

Many thanks,
Qi

Hi Qi,
Thank you for your comments! I am glad that you find it useful.

It is correct that the replay buffer takes up ~80GB of RAM at the end of training. Our implementation is based on https://github.com/denisyarats/pytorch_sac_ae which stores frame-stacked observations naively. You can reduce memory consumption by only storing each frame once and instead passing pointers around. I will soon be making an update to https://github.com/nicklashansen/dmcontrol-generalization-benchmark that includes this improvement, and I will happily update this repository as well.

And to answer your second question: even though two seeds may yield similar returns in the training environment, their generalization can vary quite a bit depending on the learned weights, so a large number of runs are required for benchmarks, both during training and evaluation; seeds 0-9 were used in our experiments. Hopefully this reliance on a large number of seeds (and thus computation) can be reduced as algorithms improve.

Let me know if you have any other questions!

Best,
Nicklas