Use With Vectorized Environments

Question

Use With Vectorized Environments

wbrenton opened this issue 2 years ago · 6 comments

What are the best practices for use with a vectorized environment? Any help is appreciated thank you

Answer 1 · 2023-05-24T09:08:59.000Z

Hey,

Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.

Answer 2 · 2023-05-24T13:59:29.000Z

@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset() or .step() on your environment api.

Here is a motivating example for why I think the question is worth while.

envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)

# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
     next_obs, rewards, dones, infos = envs.step(actions)
     # next_obs.shape = (N, 4, 86, 86)
     # rewards = (N,) # scalar rewards
     
     # loop over every environment and write the experience to it's respective writer
     for idx in range(args.num_envs):
            trajectory_writer = trajectory_writers[idx]
            trajectory_writer.append({
                'obs': obs[idx],
                'actions': actions[idx],
                'rewards': rewards[idx],
                'dones': dones[idx]
            })
            if trajectory_writer.epsiode_steps >= 2:
                trajectory_writer.create_item(
                    table='uniform_experience_replay',
                    priority=1.,
                    trajectory={
                        'obs': trajectory_writer.history['obs'][:-1],
                        'next_obs': trajectory_writer.history['obs'][-1:],
                        'actions': trajectory_writer.history['actions'][:-1],
                        'rewards': trajectory_writer.history['rewards'][:-1],
                        'dones': trajectory_writer.history['dones'][:-1],
                })

Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.

In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.

Answer 3 · 2024-04-18T09:59:16.000Z

This is a very relevant and common use-case in modern DRL. I also came looking for an answer to this.

Ideally there would be a section in the documentation regarding batched writing of trajectories.

Also, don't understand why this issue was closed. It's clearly not resolved.

Answer 4 · 2024-04-18T10:01:24.000Z

This topic is also discussed in this other issue: #78

Answer 5 · 2024-04-18T15:42:04.000Z

@thomasbbrunner glad you replied to this thread, I killed so much time trying to use reverb with vectorized envs. What framework are you using (PyTorch, JAX, etc.)?

Answer 6 · 2024-04-18T19:00:05.000Z

I'm using a combination of PyTorch + Numpy. Currently facing problems, as between 50% and 90% of the time in my rollouts is spend on reverb, with the remaining being spent on stepping the environment + metrics.

I tried using multithreading as described in #72 (comment), however, it did not lead to improvements (prob. limited by the GIL). Multiprocessing is a pain in Python, so prob. not an option (data has to pickleable).

Not sure what to try next. Did you end up finding a solution for your use-case?