This repo is the final project of reinforcement learning course. Training a model for autonomous vehicle control in CARLA (version0.9.13). The structure of the code may be confusing and some implementations are inelegant(such as retreive_data()), may refactor code in the near future.


  1. carla 0.9.13 release
  2. python 3.6
  3. dependencies in requirements.txt

set up env and test

Use precompiled Carla link

# download carla from sustech mirror, you can also follow the official instruction
tar -zxvf CARLA_0.9.13.tar.gz
./ -RenderOffScreen
# set up python environment
conda env create -f environment.yml
# test
python3 or

design details

help to modify the algorithm

carla world settings:
using default world. Deploy and destroy the vehicles when resetting the world instead of reloading the whole world by reload_world(). Retrieve the rgb camera frame in synchronous mode, convert it to tensor and then store to replaybuffer.

agent settings:
The agent car is always spawning the agent at the first spawn_point with and sensor.other.collision.

reward and action:
actions( map2action()):

action index action
0 go straight on(vehicle.control(1, 0, 0))
1 turn left(vehicle.control(1, -1, 0))
2 turn right(vehicle.control(1, 1, 0))
3 brake(vehicle.control(0, 0, 1))

rewards( get_reward())(A2C):

rewards event
-200 collision sensor retrieve event
-100 take action 3 brake
2 take action 0(go straight on)
1 take action 1,2

reward( reward_sac())(SAC):

reward event
-200 collision sensor retrieve event
1 others

RL algorithms:
currently implement A2C and SAC.


  • wrap the environment of carla following the paradigm of OpenAI gym
    • env() init the world
    • step() return info
    • reset() reset the world to the init status
    • agent(actor)

need to fix problem of reset environment. May using destroy() for all actors

solution: use collision to indicate the episode ends.

receive warning when destroy sensors: you should firstly sensor.stop() don't use reload_world(), it causes some problems(high memory usage and finally core dumped)

  • sample trajectories
  • replaybuffer
  • rl algorithm(actor-critic)
    • generate action
    • pay attention to tensor numpy conversion and detach
    • need test
  • add SAC algorithm
  • refactor code


To run the code on my limited computation resource machine(1 rtx3060), I setting it to sample one episode and then update(online A2C). Moreover, I also directly resize and crop the frames once receiving it and store it in the replaybuffer in Tensor type to save memory. Due to the limited hardware, I just tested under a small episode length but it exactly improves.
The reward settings can be further improved. The settings above is compared with serveral different settings. Taking brake frequently is too bad while driving. And if setting it to positive reward, the policy may learn to always brake no matter what it sees.

For SAC, the action space change to be continuous(controling steer[-1, 1]) instead of the discrete settings in A2C. Action is always in the format of carla.VehicleControl(1, steer, 0) where steer is given by the policy.