Robust Adversarial Reinforcement Learning

The project at hand implements the Robust Adversarial Reinforcement Learning (RARL) agent, first introduced by Pinto et al. [1]. The code is based on Stable Baslines3 (SB3) [2] and RL Baselines3 Zoo [3].

Setup

All code was developed and tested on Ubuntu 20.04 with Python 3.8.

To run the current code, we recommend to setup a virtual environment:

python3 -m venv env                     # Create virtual environment
source env/bin/activate                 # Activate virtual environment
pip install -r requirements.txt         # Install dependencies
# Work for a while
deactivate                              # Deactivate virtual environment

Furthermore, MuJoCo needs to be installed. An installation guide can be found here.

Train the RARL agent

Similar to RL Baselines3 Zoo, the hyperparameters of all RL-agents are defined in hyperparameters/algo_name.yml.

If the hyperparameters for a specific environment env_id are defined in the file, then the agent can be trained using:

python scripts/train_adversary.py --algo rarl --env env_id

It is possible to specify the total number of iterations N_iter, as well as the number of iterations for the protagonist N_μ and adversary N_ν using:

python scripts/train_adversary.py --algo rarl --env env_id --n-timesteps N_iter --N-mu Nμ --N-nu Nν

A detailed explanation of all possible command-line flags can be found here.

Train other RL agents

Besides RARL, a variety of other RL agents can be trained. A list of available algorithms can be found in the table below:

Name	Recurrent	`Box`	`Discrete`	`MultiDiscrete`	`MultiBinary`	Multi Processing
A2C¹	❌	✔️	✔️	✔️	✔️	✔️
DDPG¹	❌	✔️	❌	❌	❌	✔️
DQN¹	❌	❌	✔️	❌	❌	✔️
PPO¹	❌	✔️	✔️	✔️	✔️	✔️
QR-DQN²	❌	❌	✔️	❌	❌	✔️
SAC¹	❌	✔️	❌	❌	❌	✔️
TD3¹	❌	✔️	❌	❌	❌	✔️
TQC²	❌	✔️	❌	❌	❌	✔️
TRPO²	❌	✔️	✔️	✔️	✔️	✔️
RARL³	❌	✔️	❌	❌	❌	❌

1: Implemented in SB3 GitHub repository.

2: Implemented in SB3 Contrib GitHub repository.

3: Implemented by this GitHub repository.

To train a respective RL-agent, simply run the following code:

python scripts/train_adversary.py --algo algo_name --env env_id

References

[1] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. “Robust Adversarial Reinforcement Learning.” In: arXiv:1703.02702 [cs] (Mar. 2017). arXiv: 1703.02702. URL: http://arxiv.org/abs/1703.02702

[2] Antonin Raﬀin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. “Stable-Baselines3: Reliable Reinforcement Learning Implementations.” In: Journal of Machine Learning Research 22.268 (2021), pp. 1–8. URL: http://jmlr.org/papers/v22/20-1364.html.

[3] Antonin Raﬀin. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo. 2020

PMMon/torch-RARL

Robust Adversarial Reinforcement Learning

Setup

Train the RARL agent

Train other RL agents

References