google-deepmind/reverb

Use With Vectorized Environments

wbrenton opened this issue · 6 comments

What are the best practices for use with a vectorized environment? Any help is appreciated thank you

Hey,

Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.

@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset() or .step() on your environment api.

Here is a motivating example for why I think the question is worth while.

envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)

# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
     next_obs, rewards, dones, infos = envs.step(actions)
     # next_obs.shape = (N, 4, 86, 86)
     # rewards = (N,) # scalar rewards
     
     # loop over every environment and write the experience to it's respective writer
     for idx in range(args.num_envs):
            trajectory_writer = trajectory_writers[idx]
            trajectory_writer.append({
                'obs': obs[idx],
                'actions': actions[idx],
                'rewards': rewards[idx],
                'dones': dones[idx]
            })
            if trajectory_writer.epsiode_steps >= 2:
                trajectory_writer.create_item(
                    table='uniform_experience_replay',
                    priority=1.,
                    trajectory={
                        'obs': trajectory_writer.history['obs'][:-1],
                        'next_obs': trajectory_writer.history['obs'][-1:],
                        'actions': trajectory_writer.history['actions'][:-1],
                        'rewards': trajectory_writer.history['rewards'][:-1],
                        'dones': trajectory_writer.history['dones'][:-1],
                })

Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.

In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.

This is a very relevant and common use-case in modern DRL. I also came looking for an answer to this.

Ideally there would be a section in the documentation regarding batched writing of trajectories.

Also, don't understand why this issue was closed. It's clearly not resolved.

This topic is also discussed in this other issue: #78

@thomasbbrunner glad you replied to this thread, I killed so much time trying to use reverb with vectorized envs. What framework are you using (PyTorch, JAX, etc.)?

I'm using a combination of PyTorch + Numpy. Currently facing problems, as between 50% and 90% of the time in my rollouts is spend on reverb, with the remaining being spent on stepping the environment + metrics.

I tried using multithreading as described in #72 (comment), however, it did not lead to improvements (prob. limited by the GIL). Multiprocessing is a pain in Python, so prob. not an option (data has to pickleable).

Not sure what to try next. Did you end up finding a solution for your use-case?