safe-rl-tutorial

This repository provides the code source of the tutorial to be held on safe reinforcement learning. The key concepts of the tutorial are following:

(1) Understanding a simple DRL algorithm: Deep Deterministic Policy Gradient (DDPG)
(2) Training said algorithm in a couple of environments, 2 benchmarks and one altered environment for safe reinforcement learning
(3) Create a shield, to enable safe training for the agent!

Instalation

cd ~
git clone https://github.com/DanielLSM/safe-rl-tutorial

cd safe-rl-tutorial

conda env create -f safe-rl.yml

conda activate safe-rl

[1] Teodor Mihai Moldovan, Pieter Abbeel, Safe exploration in Markov decision processes [ref]

[2] Javier García,Fernando Fernández
A Comprehensive Survey on Safe Reinforcement Learning [ref]

[3] Mohammed Alshiekh, Roderick Bloem, Rudiger Ehlers, Bettina Konighofer, Scott Niekum, Ufuk Topcu, Safe Reinforcement Learning via Shielding [ref]

[4] Rémi Munos, Thomas Stepleton, Anna Harutyunyan, Marc G. Bellemare Safe and Efficient Off-Policy Reinforcement Learning [ref]

[5] Lillicrap, Timothy P. Continuous control with deep reinforcement learning. [ref]

2 ideas: use inverted car pendulum with a trust region. use lunar landing with certain thurst