
Application of reinforcement learning to the management of traffic light intersection

Traffic Control at Traffic Light Intersection with Reinforcement Learning

Tenderini Ruben - 879290

Falbo Andrea - 887525

Table of Contents


The primary aim of this research is to evaluate various reinforcement learning algorithms at a traffic signal junction for traffic control. Specifically, we aim to identify the optimal hyperparameters for each algorithm to determine if any of them enhances traffic flow compared to traditional fixed cycle traffic lights.


In our study, we employed the Big Intersection as the model for our traffic intersection analysis. We aimed to understand how different traffic management strategies perform under two traffic conditions: low and high traffic volumes.

Our approach included comparing a traditional traffic management system, the Fixed-Time Control, with four reinforcement learning algorithms:

  1. Q-Learning: This technique learns the optimal action to take in various traffic states through trial and error, improving over time based on the rewards received for actions taken.

  2. Deep Q-Learning: An advancement of Q-Learning, this method uses deep neural networks to deal with the complex, high-dimensional environments typical of traffic systems, enabling the algorithm to make more nuanced decisions.

  3. Sarsa: Similar to Q-Learning, Sarsa learns from the current state and action but also considers the next state and action, making it slightly more conservative but often more stable in its learning process.

  4. Sarsa Decay: A variant of Sarsa, this algorithm introduces a decaying epsilon-greedy strategy, gradually reducing the exploration of new actions over time to fine-tune its policy.

Noteworthy is that Sarsa Decay, unlike the other algorithms, was not pre-implemented in the Sumo-RL repository.


The main tools and methodologies used to conduct this study are provided below:


SUMO, acronym for "Simulation of Urban MObility", represents an open-source software dedicated to detailed simulation of urban mobility. Developed at the Center's Institute of Transportation Systems German Aerospace (DLR), SUMO offers a flexible and highly portable platform.


Sumo-RL is a repository developed by Lucas N. Alegre, focused on integrating SUMO with learning algorithms for reinforcement (RL). Its goal is to extend the capabilities of SUMO, allowing users to explore and develop advanced traffic control strategies using RL approaches.

Stable Baselines 3

Stable Baselines 3 is a reinforcement learning (RL) library in Python, developed by OpenAI. It provides a set of stable and reliable RL algorithm implementations, designed to be easily accessible and usable by developers. In this project he was also fundamental for the integration of Gymnasium environments.


2WSI-RL, an acronym for 2 Way Single Intersection for Reinforcement Learning, is a study on the application of reinforcement learning for the management of a traffic light intersection, specifically the 2 Way Single Intersection, hosted by Riccardo Chimisso.

Deep Q-Learning Agent for Traffic Signal Control

Deep Q-Learning Agent for Traffic Signal Control is a framework where an agent that learns by reinforcement via Q-Learning tries to choose the green phase of the intersection to maximize efficiency


Matplotlib is a data visualization library in Python, designed to create static, interactive plots and animations.


Python is a high-level programming language widely used in the field of learning automatic, data processing and in many scientific fields.


PyCharm is an integrated development environment specifically designed for the programming language Python.


GitHub is a hosting platform for software projects that uses version control ment Git. It provides a collaborative environment for software development, allowing developers to upload, share and manage versioning of their projects.


Here's the setup:

  1. Install Python: You can download and install Python from the official website: Python.org.

  2. Install SUMO: You can install SUMO following the instructions on the official website: SUMO Installation

  3. Set SUMO_HOME: Set SUMO_HOME variable (default sumo installation path is /usr/share/sumo)

  4. Install an IDE: You can choose between PyCharm, VSCode, or any other IDE you prefer. You can download PyCharm from JetBrains website or VSCode from Visual Studio Code website.

  5. Install SUMO-RL: You can install SUMO-RL using pip:

    pip install sumo-rl
  6. Install Matplotlib: You can install Matplotlib using pip:

    pip install matplotlib
  7. Install Stable Baselines 3: You can install Stable Baselines 3 using pip:

    pip install stable-baselines3
  8. Install pandas: You can install Pandas using pip:

    pip install pandas 
  9. Install os: You can install OS using pip:

    pip install os 
  10. Install pickle : You can install Pickle using pip:

    pip install pickle 
  11. Install yaml: You can install Yaml using pip:

    pip install PyYAML 
  12. Install abc: You can install Abc using pip:

    pip install abc 
  13. Install other packages: Use pip to install additional Python packages required for your project.

    pip install *other_packages* 
  14. Install linear-rl You can install the linear-rl repository by Lucas Alegre needed for Sarsa using pip:

     pip install  git+https://github.com/LucasAlegre/linear-rl

If you're using PyCharm, after following these simple steps, everything should be ready to go!

The same should apply for VSCode!


  • big-intersection: This folder holds essential files for our traffic intersection model. It includes:

    • BI.net.xml: This file defines how the roads and intersections are laid out using SUMO.
    • BI_50_test.rou.xml: It represents scenarios with low traffic, carefully designed using SUMO.
    • BI_150_test.rou.xml: It represents scenarios with high traffic, carefully designed using SUMO.
  • configs: Here, you'll find configuration files for different reinforcement learning algorithms used during training sessions, all made in YAML format:

    • learn_low.yaml: Config used to train models on low traffic
    • learn_high.yaml: Config used to train models on high traffic
    • test_low_low.yaml: Config used to test on low traffic the models trained in low traffic
    • test_low_high.yaml: Config used to test on high traffic the models trained in low traffic
    • test_high_low.yaml: Config used to test on low traffic the models trained in high traffic
    • test_high_high.yaml: Config used to test on high traffic the models trained in high traffic
  • docs: A repository of documentation and research materials essential for understanding and extending the project:

    • relazione.pdf: A detailed report documenting the project's objectives, methodologies, results, and conclusions, presented in Italian.
    • report.pdf: A translated version of the report, catering to an English-speaking audience
  • output: This directory contains various files generated by project scripts:

    • csv: Holds CSV files, each corresponding to a specific reinforcement learning algorithm and phase, for analysis.
    • plots: Contains visualizations offering insights into algorithm performance during training and testing.
    • model: Stores trained models of the algorithms, representing their learned behaviors in the traffic management domain.
  • scripts: The heart of the project, this directory harbors all Python scripts necessary for execution:

    • agents: Includes Python files that define how different reinforcement learning algorithms behave. Agent configurations are taken from the .yaml files in the configs folder.
      • dqn_agent.py: Implementation of the Deep Q-Network (DQN) agent, adept at handling complex decision-making tasks through deep neural networks.
      • ql_agent.py: Implementation of the Q-Learning (QL) agent, leveraging tabular methods to learn optimal policies in dynamic environments.
      • fixed_cycle.py: Implementation of a fixed cycle strategy, providing a stable reference point for evaluating the performance of dynamic algorithms.
      • learning_agent.py: Abstract class containing the key methods and utilities inherited by all other reinforcement learning algorithms
      • sarsa_agent.py: Implementation of the State-Action-Reward-State-Action (SARSA) algorithm, facilitating temporal difference learning with on-policy updates.
      • sarsa_agent_decay.py: Extending SARSA, this file implements epsilon-greedy exploration to balance between exploration and exploitation during learning.
    • custom: Holds special wrapper files customized to work better with SUMO-RL integration.
      • custom_environment.py: A wrapper providing enhanced functionality and abstraction for interfacing with the SUMO environment. Created specially to better handle fixed cycle agents.
      • custom_true_online_sarsa.py: A specialized wrapper facilitating the implementation of SARSA with decay.
    • utils: Contains essential utility scripts that ensure the project runs without any hitches:
      • config_parser.py: A robust parser for configuration files, enabling seamless extraction and utilization of algorithmic parameters. It checks config files format using values specified in config_values.py.
      • config_values.py: A comprehensive collection of possible values and configurations.
      • plotter.py: An essential tool for data visualization, aiding in the analysis and interpretation of experimental results.
      • runner.py: A script orchestrating the execution of the project, managing training sessions, testing phases, and result generation with ease and efficiency.
  • main.py: The central execution file of the project.

Configuration format

All .yaml file within the directory configs must adhere to the following format.

  Output: 'path/to/output'  # directory in which to save the plots
  Width: 3840   # Width in pixel of final image. Optional field
  Height: 1080  # Height in pixel of final image. Optional field
  Metrics: ['system_total_stopped','system_total_waiting_time','system_mean_waiting_time','system_mean_speed'] # Metrics to be plotted

  Output_csv: 'path/to/csv'         # Directory in which csvs are saved
  Output_model: 'path/to/models'    # Directory in which models are saved
  Environment:                      # Section dedicated to the environment
    Traffic_type: type of traffic, possible values: 'low' or 'high'
    Gui: whether or not to render GUI, possible values: True, False
    Num_seconds: seconds to run the simulation for
    Min_green: minimum green phase duration
    Max_green: maximum green phase duration
    Yellow_time: yellow phase duration
    Delta_time: time elasped during a step
  Instances:                        # Section where to insert agents
    Agent_1:                        # Agent config format is shown in the section below

Possible agents configurations:

  • Fixed agent configuration:
  Agent_type: 'FIXED'
  Runs: number of runs
  • Q-Learning agent configuration:
  Agent_type: 'QL'
  Runs: number of runs
  Alpha: alpha value
  Gamma: gamma value
  Init_epsilon: initial epsilon value
  Min_epsilon: minimum epsilon value
  Decay: decay value
  • DQN agent configuration:
  Agent_type: 'DQN'
  Runs: number of runs
  Alpha: alpha value
  Gamma: gamma value
  Init_epsilon: initial epsilon value
  Final_epsilon: final epsilon value
  Exp_fraction: exploration fraction value
  • SARSA agent configuration:
  Agent_type: 'SARSA'
  Runs: number of runs
  Alpha: alpha value
  Gamma: gamma value
  Epsilon: epsilon value
  FourierOrder: fourier order value
  Lambda: lambda value
  • SARSA with decay agent configuration:
  Agent_type: 'SARSA_decay'
  Runs: number of runs
  Alpha: alpha value
  Gamma: gamma value
  Epsilon: epsilon value
  FourierOrder: fourier order value
  Lambda: lambda value
  Decay: decay vaue

To load trained models from file specify the following field in the agent config. If specified, only Agent_type and Runs are mandatory in the agent config.

Model: 'path/to/saved/agent'


All the theoretical background, study, and experiments conducted are documented in the docs folder in both English (report.pdf) and Italian (relazione.pdf).


