google-deepmind/dqn_zoo

Memory issues when running with Docker

Closed this issue · 7 comments

Hello,

I tried to run QR QDN on Atari keeping your initial setup (1 million frame per iteration and 200 iterations). The memory usage increases linearly until the system kills the process.

I used the run.sh that launches a docker image.

Would you please help with that?

Thanks

jqdm commented

Could you confirm the latest version of the code is being used, with all recent commits? Have you made any local modifications? Also just to check, how much memory is being used?

We've not observed memory issues when running internally, though this is without Docker. Memory usage should plateau once the replay is full. Could be worth trying to run the agent without Docker to see if that still has memory issues. Dependencies are listed here.

Hi, I actually tried without docker cpu and it runs smoothly, the memory usage is when I run through docker.

I did not change anything, the repo code as it is.

jqdm commented

Thanks for the information, will investigate.

sure, what's the env required to run on GPU? should it be cuda 11.1.1 and nvidia driver 455?

jqdm commented

Instructions for GPU support can be found on the JAX installation docs here, but CUDA 11.1 should work.

jqdm commented

I've not been able to reproduce the memory issue with Docker on my desktop, the memory usage plateaus as expected when I try. Was there anything special about your Docker setup, e.g. which version of Docker are you using? I assume you set --jax_platform_name=cpu.

jqdm commented

Closing as unable to reproduce and this issue seems specific to when running with Docker in a certain way rather than anything in the agent code itself. Feel free to reopen if you have any more information that would help with reproducing.