Farama-Foundation/AutoROM

Is it possible to directly include the rom binary file in the source code repo?

Trinkle23897 opened this issue · 8 comments

I wonder if it is a license issue, but through the download page description I don't think it limits the download. Please correct me if I'm wrong, many thanks!

I never got an email for this issue and don't check GitHub often; I apologize for the delayed reply.

It is not legal for us to distribute ROMs, which is why Atari-Py (now ALE-Py) do not ship them anymore. However, it is legal for us to have a library that automatically installs them from a 3rd party (even if that third parties hosting of them may or may not be legal) in the same way that torrent clients are legal. AutoROM is the officially recommended and easiest way to install ROMs given the legal developments, and the ALE and Gym use it. It's presumably the solution for Tianshou CI.

Actually I'm doing another project and meet this issue instead of tianshou. I just go through openai/gym#2259 and see you'd like to build the new gym's vector_env API. But this is quite inefficient (thu-ml/tianshou#409).

Recently I'm working on a project EnvPool that can efficiently execute multiple environments concurrently, this is the benchmark result on Nvidia Apollo (96 CPU cores) and DGX-A100 (256 CPU cores) for pure environment simulation:

2021-09-16 10-33-46 的屏幕截图

2021-09-16 10-35-57屏幕截图

As you can see, this is way faster than gym.vector_env (10x). BTW EnvPool also supports multi-agent env. We are going to open-source this project in Sept. or Oct., and I'm wondering if you have any interest in integrating new environments (including your awesome PettingZoo!) after we finishing open-source. Thanks!

@Trinkle23897 I am certainly curious about it. After it is made open source, I will definitely take a look and see if it makes sense to add multi-agent support.

On the side, I have been a big fan of heavy use of vector environments, and I am surprised that they performed so poorly. Do you have a sense of why the sync EnvPool is more efficient than gym's AsyncVectorEnv?

So first of all, in my context gym's AsyncVectorEnv is sync step because they step all of the environments simultaneously, no matter they use different sub-process to execute env.step.

EnvPool-sync can achieve 3x faster than gym's vector_env is mainly because of two things: one for directly C++ env integration and C++ level parallelization, and another one for efficient data movement and batching operation via pybind11.

EnvPool-async's behavior is exactly the same as tianshou's VectorEnv which has implemented async step, see https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#parallel-sampling or thu-ml/tianshou#103

see if it makes sense to add multi-agent support.

I've already integrated vizdoom multi-agent CIG game by following Sample-Factory's approach (tons of env wrapper...).

@Trinkle23897 Its cool that you have implemented multi-agent support for vizdoom.

As for EnvPool, I totally understand that asynchronous execution can be much more efficient than synchronous execution. But I suspect that the main performance gaps between EnvPool-sync and gym's AsyncVecEnv can be bridged in pure python, by making effective use of low level process primitives such as process shared memory, process events, locks, and zero-copy numpy operations. Gym's vector envs don't make perfect use of these low level primitives, but its usage isn't terribly inefficient either. So I tried to reproduce these performance plots.

Here is my code to benchmark gym's envs:

import gym
import supersuit
import multiprocessing
import random
import time

num_frame_skips = 4
num_envs = 24

def benchmark(venv):
    # seeds each sub-env differently
    venv.seed(42)
    venv.reset()
    start = time.time()
    num_steps = 1000
    for i in range(num_steps):
        actions = [random.randrange(4) for i in range(venv.num_envs)]
        obs, rews, dones, infos = venv.step(actions)
    end = time.time()
    steps_per_sec = num_steps * venv.num_envs / (end - start)
    return steps_per_sec

def make_env():
    env = gym.make("PongNoFrameskip-v4")
    env = supersuit.frame_skip_v0(env, 4)
    return env

if __name__ == "__main__":
    env = make_env()

    print(f"env_gym: {num_frame_skips*benchmark(gym.vector.SyncVectorEnv([make_env]*1, env.observation_space, env.action_space))}")
    print(f"vec_env_gym: {num_frame_skips*benchmark(gym.vector.AsyncVectorEnv([make_env]*num_envs, env.observation_space, env.action_space))}")

On my 12 core, 24 thread Threadripper 1920x machine, with gym==0.18.0 I am getting the following results:

env_gym: 4843.880482389659
vec_env_gym: 33358.061431865965

If the above benchmark is correct, is suggests that the vector env isn't that inefficient, getting a 6x speedup over the single threaded version. And it is performing better than the 96 core machine. Any thoughts on this?

not really.
2021-09-19 12-03-33 的屏幕截图

40k ~ 50k fps in gerenal from my trials