Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning
This repo contains an implementation of Double Dueling Deep Recurrent Q-Network which can be enhanced with several exploration strategies, like deterministic epsilon-greedy, adaptive epsilon-greedy (VDBE and BMC) [1], softmax, max-boltzmann exploration and VDBE-softmax, and an error masking strategy [2], [4].
./AirsimEnv/
: folder where the two environments (AirsimEnv.py
andAirsimEnv_9actions.py
) are stored; the former includes five steering angles and the latter nine steering angles. Further, this folder contains:DRQN_classes.py
: implementation of agent, experience replay, exploration strategies, neural network and connection with AirSim NH are definedbayesian.py
: a support for BMC epsilon-greedyfinal_reward_points.csv
: a support for reward calculation (required for env scripts)
DRQN_airsim_training.py
: contains training loop in which all files in the previous points are required (main script for training process)DRQN_evaluation.py
: contains training and test evaluation; each subset is defined with a different set of starting points to evaluate the model performance
- Python 3.7.6
- Tensorflow 2.5.0
- Tornado 4.5.3
- OpenCV 4.5.2.54
- OpenAI Gym 0.18.3
- Airsim 1.5.0
- 2 GPU Tesla M60 with 8 Gb
[1] Gimelfarb, M., S. Sanner, and C.-G. Lee, 2020: ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. CoRR
[2] Juliani A., 2016: Simple Reinforcement Learning with Tensorflow Part 6: Partial Observability and Deep Recurrent Q-Networks. URL: https://github.com/awjuliani/DeepRL-Agents
[3] Riboni, A., A. Candelieri, and M. Borrotti, 2021: Deep Autonomous Agents comparison for Self-Driving Cars. Proceedings of The 7th International Conference on Machine Learning, Optimization and Big Data - LOD
[4] Welcome to AirSim, https://microsoft.github.io/AirSim/
Zangirolami, V. and M. Borrotti, 2024: Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning. In: Knowledge-Based Systems 293. Paper
I acknowledge Data Science Lab of Department of Economics, Management and Statistics (DEMS) of University of Milan-Bicocca for providing a virtual machine.