thu-ml/tianshou

Gym3 vec environments and other nice features

drozzy opened this issue · 5 comments

Have you guys looked at Gym3 by OpenAI?
I think they have some really nice features, like better vectorized envionment support, no more reset()/render() functions and also separation of act and observe functionality (as opposed to both happening in the step).

Design choices here:
https://github.com/openai/gym3/blob/4c3824680eaf9dd04dce224ee3d4856429878226/docs/design.md

Maybe this will make some tianshou design choices easier in some place.

I'm not sure how popular gym3 is, but I've seen this separation of act/observe elsewhere: https://juliareinforcementlearning.org/docs/How_to_write_a_customized_environment/

This is quite slow. I run it on DGX-A100 (256 core) with PongNoFrameskip-v4, it only got ~4000 FPS (even without env wrappers!)

import gym, gym3, time, tqdm, numpy as np
numenv = 128  # change this
total = 500  # actually I use 50000 for my own test
e = gym3.vectorize_gym(num=numenv, env_kwargs={"id": "PongNoFrameskip-v4"})
t0 = time.time()
for _ in tqdm.trange(total):
  e.act(np.random.randint(6, size=numenv))
  e.observe()
fps = total * numenv / (time.time() - t0)
print(fps)

Even use procgen's optimized envs it has only ~40k FPS.

import time
import tqdm
from gym3 import types_np
from procgen import ProcgenGym3Env
env = ProcgenGym3Env(num=128, env_name="coinrun")
total = 10000
t0 = time.time()
for _ in tqdm.trange(total):
    env.act(types_np.sample(env.ac_space, bshape=(env.num,)))
    rew, obs, first = env.observe()
print(total * env.num / (time.time() - t0))

These days I'm writing a high-performance vectorized environment interface that can easily achieve more than 500k FPS on A100 (Atari games with openai env wrapper), this is compatible with current tianshou's VectorEnv API, and also dm_env API, even multi-player setting! It will be open-sourced soon.

Oh interesting. Thanks! Looking forward to it.

How does dm_env api https://github.com/deepmind/dm_env/blob/master/docs/index.md compare to gym's api?
Do you find it better?

Both of them have pros and cons:

  • gym pros: popular
  • gym cons: need to be manually reset; reset's signature doesn't support multiagent setting
  • dm_env pros: can auto reset; support multiagent naturally;
  • dm_env cons: need namedtuple; need to pre-define all members in obs_spec() and act_spec(); wasteful discount keyword