alex-petrenko/sample-factory

PBT with parallel simulations

Asad-Shahid opened this issue · 5 comments

Hi,
Does PBT work with the Isaac gym envs?

Thanks

Hi! Technically it should work, except it won't be possible to have multiple agents controlled by different policies in the same environment.

IsaacGym works in batched mode, which keeps everything on the GPU in big batches.

For PBT to work you'd need to set num_policies > 1 and num_workers = num_policies (each worker will run an env controlled by a corresponding policy).
Let me know if you have any luck with that!

There must be a limit of how many IsaacGym processes you can run on a single machine. On a multi-GPU machine you can probably run a lot of them. We also have a decentralised PBT implementation in the works that can run on multiple machines, it will be merged at some point, but no ETA on this yet, it's stuck in some corporate reviews.

Thanks for the suggestion. I tried creating multiple instances of Isaac Gym using multi workers, but it throws an error:

[Error] [carb.gym.plugin] Function GymGetActorDofStates cannot be used with the GPU pipeline after the simulation starts. Please use the tensor API if possible. See docs/programming/tensors.html for more info.

Are you running in serial mode? In this case everything runs in the same process and you will have conflicts.
I can imagine if you run in parallel mode (serial=False), you should be able to run multiple workers?

If having multiple IsaacGym processes is not an option, I can also suggest writing a wrapper which starts an IG instance in a separate process and splits the vector of agents into sub-vectors which are each going to be controlled by a different policy. These sub-vector envs can run on separate workers thus making Sample Factory happy.
But in reality they won't do anything, they will just send the requests to the full env running in a background process.

Hope it makes sense. It's a bit of work but definitely can be done.

I can only run a maximum of 2 IsaacGym processes on my single GPU machine. Running with serial_mode=False returns errors. First, there was a shape mismatch here, then an issue with the device. I finally decided to control each robot in IsaacGym env using a different policy, though its extremely slow. Actually, the slowest part is an update of the networks. I am looping over the agents to sample from buffers and do updates. Is there an efficient way of doing this?