Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning

Description

This repo contains an implementation of Double Dueling Deep Recurrent Q-Network which can be enhanced with several exploration strategies, like deterministic epsilon-greedy, adaptive epsilon-greedy (VDBE and BMC) [1], softmax, max-boltzmann exploration and VDBE-softmax, and an error masking strategy [2], [4].

Code Structure:

./AirsimEnv/: folder where the two environments ( AirsimEnv.py and AirsimEnv_9actions.py ) are stored; the former includes five steering angles and the latter nine steering angles. Further, this folder contains:
- DRQN_classes.py: implementation of agent, experience replay, exploration strategies, neural network and connection with AirSim NH are defined
- bayesian.py: a support for BMC epsilon-greedy
- final_reward_points.csv: a support for reward calculation (required for env scripts)
DRQN_airsim_training.py: contains training loop in which all files in the previous points are required (main script for training process)
DRQN_evaluation.py: contains training and test evaluation; each subset is defined with a different set of starting points to evaluate the model performance

Prerequisites

Python 3.7.6
Tensorflow 2.5.0
Tornado 4.5.3
OpenCV 4.5.2.54
OpenAI Gym 0.18.3
Airsim 1.5.0

Hardware

2 GPU Tesla M60 with 8 Gb

References

[1] Gimelfarb, M., S. Sanner, and C.-G. Lee, 2020: ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. CoRR

[2] Juliani A., 2016: Simple Reinforcement Learning with Tensorflow Part 6: Partial Observability and Deep Recurrent Q-Networks. URL: https://github.com/awjuliani/DeepRL-Agents

[3] Riboni, A., A. Candelieri, and M. Borrotti, 2021: Deep Autonomous Agents comparison for Self-Driving Cars. Proceedings of The 7th International Conference on Machine Learning, Optimization and Big Data - LOD

[4] Welcome to AirSim, https://microsoft.github.io/AirSim/

How to cite

Zangirolami, V. and M. Borrotti, 2024: Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning. In: Knowledge-Based Systems 293. Paper

Acknowledgements

I acknowledge Data Science Lab of Department of Economics, Management and Statistics (DEMS) of University of Milan-Bicocca for providing a virtual machine.

DEMO

DRQN-bmc.mp4

ValentinaZangirolami/DRL