
sim problem in Google Colab and Mac

!python -m scenarionet.sim -d /content/exp_converted/ --render 3D

Known pipe types:
(1 aux display modules not yet loaded.)
:ShowBase(warning): Unable to open 'onscreen' window.
Whether colab can run?

The scenarionet/ requires a screen for your machine to render the 3D scenarios, while colab machines don't have any video output devices and thus can not run this script. But if your notebook is running locally with a screen, of course, you can launch this script.

But you can still visualize the scenarios in colab :) A good workaround is to use the 2D pygame renderer, which don't need Xserver/screen and so on. Then you can save the frames to a GIF once you finished an episode and play that GIF.

An example is at: There is a section called Real-world Scenario Environment Visualization. All you need to pay attention is to use frame=env.render( mode="top_down", **extra_args ) to save a frame. and use

import pygame
import numpy as np
from PIL import Image

imgs = [pygame.surfarray.array3d(frame) for frame in frames]
imgs = [Image.fromarray(img) for img in imgs]
imgs[0].save("demo.gif", save_all=True, append_images=imgs[1:], duration=50, loop=0)
print("\nOpen gif...")
from IPython.display import Image
Image(open("demo.gif", 'rb').read())

to generate the GIF.

Thanks for your feedback. I should make this more clear and will add a colab example to this repo soon.

When I followed the Colab example, It worked. Thank you very much!

New problem
In ScenarioNet document, i only see the visualization. But i want to hijack a vehicle (e.g. AV), and using the algorithm to control this vehicle, what can i do?

i think the ScenarioNet document don't show the process.

If you wanna control the vehicle. Just remove the config "agent_policy" from the dict. After that, the agent policy will restore to the default ExternalInputPolicy which uses the input of env.step() to set the throttle or steering for the ego vehicle. The input dict is a two-dim vector [throttle, steering]. The values for both dims should be in the range [-1, 1]. Thus, for example, you can use env.step([0,1]) to make the car move forward.

This look like a simple control, just throttle and steering, and without perception and decision. I hope using the trained autonomous driving algorithm. In your example, ScenarioNet with ROS or OpenPilot can achieve this goal?

Well, it depends on how to build your autonomous driving system (ADS). Basically, an ADS is a mapping or function from image/lidar/imu to throttle/steering. The env.step() will return observation which contains image/lidar/imu data for the input of ADS. Then your ADS should produce [throttle, steering], which will be fed into the next env.step().

The pseudo-code is like:

my_ADS = ADS()
o,_ = env.reset()
for i in range(max_episode_len):
    o, r, d, t, i =env.step(action)
    if d:

Therefore, the decision should happen in the my_ADS.compute_action(o). You can make it as complex as the openpilot or as simple as an end-to-end RL policy. But even for the complex openpilot controller, it still follows the decision-making procedure above taking image as input and output throttle/steering.

Thanks for the answer. Is it possible to provide a simple end-to-end RL policy example in the documentation for easier understanding?

I cannot document too many details on training/desiging at this time. Sorry about it.

But we do include an end-to-end driving policy in the simulator. The source code is at
The policy is a 3-layer MLP trained with a huge amount of data. It takes 240 pseudo lidar points, IMU, and navigation info as input and output throttle and steering.

To experience this policy, just run python -m metadrive.examples.drive_in_single_agent_env The autopilot mode means the car is controlled by the end2end policy.

I get it.

In addition to the Google Colab, the MacBook Air (Apple M1) has the similar problem.
What should I do? Thanks.

1. python -m scenarionet.sim -d /path/to/exp_converted --render 3D

:ShowBase(warning): Unable to open 'onscreen' window.
2.python -m scenarionet.sim -d /path/to/exp_converted --render advanced

[!!!] RenderPipeline Sorry, your GPU does not support compute shaders! Make sure you have the latest drivers. If you already have, your gpu might be too old, or you might be using the open source drivers on linux.

Hi Yuening,

Sorry about that. It is actually a known issue that Mac with the M-series chips can not launch the 3D rendering service. A workaround is still using the top-down renderer.


I cannot document too many details on training/desiging at this time. Sorry about it.

But we do include an end-to-end driving policy in the simulator. The source code is at The policy is a 3-layer MLP trained with a huge amount of data. It takes 240 pseudo lidar points, IMU, and navigation info as input and output throttle and steering.

To experience this policy, just run python -m metadrive.examples.drive_in_single_agent_env The autopilot mode means the car is controlled by the end2end policy.

Currently, ppo_expert is applied as an example only in drive_in_single_agent_env on MetaDrive. However, my goal is to implement this policy in converted Waymo scenarios (by ScenarioNet) and control the ego car (self-driving car) in each scenario. Due to my limited capacity, I cannot accomplish this process by myself. Can you give me some help?

The ScenarioEnv is compatible with any reinforcement learning framework. I recommend setting the number of scenarios to 1 and using algorithms from stable-baselines3 to train your first policy in a single Waymo scene.

If you are familiar with Ray, you can build your training script based on this:

I tried a training demo based on stable baselines3 in MetaDrive document (, and I met some troubles.

import gymnasium as gym
import matplotlib.pyplot as plt
import os

from functools import partial
from IPython.display import clear_output
from IPython.display import Image
from metadrive.envs import MetaDriveEnv
from metadrive.envs import ScenarioEnv
from metadrive.utils import generate_gif
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env.subproc_vec_env import SubprocVecEnv

num_scenarios = 50000

def waymo_env(need_monitor=False):
    env = ScenarioEnv(
            # manual_control=False,
            # reactive_traffic=False,
            # use_render=False,
    if need_monitor:
        env = Monitor(env)
    return env

# 8 subprocess to rollout
train_env=SubprocVecEnv([partial(waymo_env, True) for _ in range(8)])
# train_env=waymo_env()

model = PPO("MlpPolicy",
model.learn(total_timesteps=25_000 if os.getenv('TEST_DOC') else 300_000,
# model.learn(total_timesteps=300_000, progress_bar=True)

print("Training is finished! Generate gif ...")

# model.load("/content/drive/MyDrive/Autonomous_Driving_Algorithm_Waymo")

# evaluation
for seed in range(num_scenarios):
        total_reward = 0
        obs, _ = env.reset(seed=seed)
        for i in range(1000):
            action, _states = model.predict(obs, deterministic=True)
            obs, reward, done, _, info = env.step(action)
            total_reward += reward
            ret = env.render(mode="topdown",
                            film_size=(1200, 1200)
                            # screen_size=(600, 600),
                            # camera_position=(50, 50)
            if done:
                print("episode_reward", total_reward)

print("gif generation is finished ...")
  1. When I selected train_env=SubprocVecEnv([partial(waymo_env, True) for _ in range(8)])
  1. Did you use this training demo in converted Waymo Open Motion Dataset? When I used a small number of scenarios as training set, It worked. But the training/evaluation result is relatively poor. The target vehicle couldn't follow the reference trajectory. Could you give me some suggestions?

I have no idea about question 1. But for question, it seems a problem of metadrive. Please set show_crosswalk and show_sidewalk as False to see if it is fixed For the last problem, you have to increase the number of samples, 300_000 is not enough. Generally, 1 million is the minimum requirement. If you use PPO, the total number of steps should be increased to 10 million.

Yes, I set show_crosswalk=False and show_sidewalk=False, the question 2 is solved.

I choose training parameters:

num_scenarios = 70,000
total_timestamps = 10,000,000

The figure shows the training process. Can I consider the training has achieved good results after 4 million timesteps. What is the meaning of some key indicators in the log, for example, why can ep_len_mean and ep_rew_mean reach quite large values?

Yeah, the reward is pretty high. You can visualize the scenario to see if it works well.

By the way, the bug in the second problem should be fixed already. Could you pull the latest MetaDrive and enable show_sidewalk and show_crosswalk to see if it still happens?

I think the default PPO algorithm designing is not suitable for Waymo dataset.

  1. The trained scenario sometimes don't show up completely.

scenario_0 (7)

  1. The waymo dataset totals 20 seconds, but the trained scenarios will exceed 20 seconds, which may cause the reward to increase all the time.

scenario_1 (8)

  1. ......

In your paper, the algorithm is only applied to the nuPlan and PG datasets. So, can you adjust the reward function and termination conditions to fit the Waymo dataset?

By the way, the bug in the second problem should be fixed already. Could you pull the latest MetaDrive and enable show_sidewalk and show_crosswalk to see if it still happens?

The bug seems to be solved.

For the first problem, there is a key map_region_size in env_config which may address this issue by assigning it a larger value such as 1024. Also, if you are using topdown renderer, the clipping brought by film_size may result in this as well. A larger film size may address this as well. Please refer to for more details.

For the second problem, you can set horizion=300 or so on to terminate the environment so the environment step and reward won't increase forever.

For the third problem, I believe the reward function and termination condition can be generalized to the Waymo dataset. The problem that you can not get a good result could be

  1. The traffic can not react to ego car, which results in unreasonable collisions. Turn on reactive traffic to enable reactive traffic.
  2. The algorithm parameter may not be appropriate. Please refer to the settings here

Thanks. I'll try later.

I also tried for convenience, but I ran into problems with insufficient memory, so how could I reduce memory usage?

I have some problems with the PPO algorithm training:

  1. it seems strange that no result appears in the first 21500s, and then output the result every 250s;
  2. An error is reported during the training process: ValueError: Summary file is not found at /content/drive/MyDrive/mdsn/scenarionet/dataset/waymo_test/dataset_summary.pkl!
wandb: Synced TEST_aa36c_00000:
== Status ==
Memory usage on this node: 2.6/51.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0.0/8 CPUs, 0/0 GPUs, 0.0/26.95 GiB heap, 0.0/9.28 GiB objects
Result logdir: /content/drive/MyDrive/mdsn/scenarionet/experiment/TEST
Number of trials: 1 (1 ERROR)
| Trial name                               | status   | loc   |   seed |   iter |   total time (s) |     ts |   reward |   success |   coverage |   out |   max_step |   length |   level |
| MultiWorkerPPO_GymEnvWrapper_aa36c_00000 | ERROR    |       |      0 |     14 |          24023.6 | 728000 | -1.42022 |   0.48913 |     0.1152 | 0.125 |    0.38587 |  292.761 |      13 |
Number of errored trials: 1
| Trial name                               |   # failures | error file                                                                                                                              |
| MultiWorkerPPO_GymEnvWrapper_aa36c_00000 |            1 | /content/drive/MyDrive/mdsn/scenarionet/experiment/TEST/MultiWorkerPPO_GymEnvWrapper_aa36c_00000_0_seed=0_2024-02-15_05-54-21/error.txt |

== Status ==
Memory usage on this node: 2.6/51.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0.0/8 CPUs, 0/0 GPUs, 0.0/26.95 GiB heap, 0.0/9.28 GiB objects
Result logdir: /content/drive/MyDrive/mdsn/scenarionet/experiment/TEST
Number of trials: 1 (1 ERROR)
| Trial name                               | status   | loc   |   seed |   iter |   total time (s) |     ts |   reward |   success |   coverage |   out |   max_step |   length |   level |
| MultiWorkerPPO_GymEnvWrapper_aa36c_00000 | ERROR    |       |      0 |     14 |          24023.6 | 728000 | -1.42022 |   0.48913 |     0.1152 | 0.125 |    0.38587 |  292.761 |      13 |
Number of errored trials: 1
| Trial name                               |   # failures | error file                                                                                                                              |
| MultiWorkerPPO_GymEnvWrapper_aa36c_00000 |            1 | /content/drive/MyDrive/mdsn/scenarionet/experiment/TEST/MultiWorkerPPO_GymEnvWrapper_aa36c_00000_0_seed=0_2024-02-15_05-54-21/error.txt |

Sorry, I have no idea. It is something raised by Ray. How many workers are you using? Does this still persist if you only use one worker?

ERROR -- Log sync requires rsync to be installed.

Is the reason related to the lack of rsync in Windows?

Not sure. You can search related stuff in Ray's GitHub issue list.