/safe-rl-tutorial

Just a mini tutorial on safe rl

Primary LanguageJupyter NotebookMIT LicenseMIT

safe-rl-tutorial

This repository provides the code source of the tutorial to be held on safe reinforcement learning. The key concepts of the tutorial are following:

  • (1) Understanding a simple DRL algorithm: Deep Deterministic Policy Gradient (DDPG)
  • (2) Training said algorithm in a couple of environments, 2 benchmarks and one altered environment for safe reinforcement learning
  • (3) Create a shield, to enable safe training for the agent!

Instalation

Ubuntu 20.04/18.04 (tested)

Requirements:

  • Anaconda 3

Instructions

  1. Open a terminal
  2. Clone the repository
cd ~
git clone https://github.com/DanielLSM/safe-rl-tutorial
  1. Move to the repository in your system
cd safe-rl-tutorial
  1. Install the anaconda environment
conda env create -f safe-rl.yml
  1. Load the anaconda environment
conda activate safe-rl

References

[1] Teodor Mihai Moldovan, Pieter Abbeel, Safe exploration in Markov decision processes [ref]

[2] Javier García,Fernando Fernández
A Comprehensive Survey on Safe Reinforcement Learning [ref]

[3] Mohammed Alshiekh, Roderick Bloem, Rudiger Ehlers, Bettina Konighofer, Scott Niekum, Ufuk Topcu, Safe Reinforcement Learning via Shielding [ref]

[4] Rémi Munos, Thomas Stepleton, Anna Harutyunyan, Marc G. Bellemare Safe and Efficient Off-Policy Reinforcement Learning [ref]

[5] Lillicrap, Timothy P. Continuous control with deep reinforcement learning. [ref]

2 ideas: use inverted car pendulum with a trust region. use lunar landing with certain thurst