Deep Recurrent Q-Network with different exploration strategies for self-driving cars (using AirSim)

Primary LanguagePython

Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning


This repo contains an implementation of Double Dueling Deep Recurrent Q-Network which can be enhanced with several exploration strategies, like deterministic epsilon-greedy, adaptive epsilon-greedy (VDBE and BMC) [1], softmax, max-boltzmann exploration and VDBE-softmax, and an error masking strategy [2], [4].

Code Structure:

  • ./AirsimEnv/: folder where the two environments ( AirsimEnv.py and AirsimEnv_9actions.py ) are stored; the former includes five steering angles and the latter nine steering angles. Further, this folder contains:
    • DRQN_classes.py: implementation of agent, experience replay, exploration strategies, neural network and connection with AirSim NH are defined
    • bayesian.py: a support for BMC epsilon-greedy
    • final_reward_points.csv: a support for reward calculation (required for env scripts)
  • DRQN_airsim_training.py: contains training loop in which all files in the previous points are required (main script for training process)
  • DRQN_evaluation.py: contains training and test evaluation; each subset is defined with a different set of starting points to evaluate the model performance


  • Python 3.7.6
  • Tensorflow 2.5.0
  • Tornado 4.5.3
  • OpenCV
  • OpenAI Gym 0.18.3
  • Airsim 1.5.0


  • 2 GPU Tesla M60 with 8 Gb


[1] Gimelfarb, M., S. Sanner, and C.-G. Lee, 2020: ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. CoRR

[2] Juliani A., 2016: Simple Reinforcement Learning with Tensorflow Part 6: Partial Observability and Deep Recurrent Q-Networks. URL: https://github.com/awjuliani/DeepRL-Agents

[3] Riboni, A., A. Candelieri, and M. Borrotti, 2021: Deep Autonomous Agents comparison for Self-Driving Cars. Proceedings of The 7th International Conference on Machine Learning, Optimization and Big Data - LOD

[4] Welcome to AirSim, https://microsoft.github.io/AirSim/

How to cite

Zangirolami, V. and M. Borrotti, 2024: Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning. In: Knowledge-Based Systems 293. Paper


I acknowledge Data Science Lab of Department of Economics, Management and Statistics (DEMS) of University of Milan-Bicocca for providing a virtual machine.

