The project at hand implements the Robust Adversarial Reinforcement Learning (RARL) agent, first introduced by Pinto et al. [1]. The code is based on Stable Baslines3 (SB3) [2] and RL Baselines3 Zoo [3].
All code was developed and tested on Ubuntu 20.04 with Python 3.8.
To run the current code, we recommend to setup a virtual environment:
python3 -m venv env # Create virtual environment
source env/bin/activate # Activate virtual environment
pip install -r requirements.txt # Install dependencies
# Work for a while
deactivate # Deactivate virtual environment
Furthermore, MuJoCo needs to be installed. An installation guide can be found here.
Similar to RL Baselines3 Zoo, the hyperparameters of all RL-agents are defined in hyperparameters/algo_name.yml
.
If the hyperparameters for a specific environment env_id
are defined in the file, then the agent can be trained using:
python scripts/train_adversary.py --algo rarl --env env_id
It is possible to specify the total number of iterations Niter, as well as the number of iterations for the protagonist Nμ and adversary Nν using:
python scripts/train_adversary.py --algo rarl --env env_id --n-timesteps N_iter --N-mu Nμ --N-nu Nν
A detailed explanation of all possible command-line flags can be found here.
Besides RARL, a variety of other RL agents can be trained. A list of available algorithms can be found in the table below:
Name | Recurrent | Box |
Discrete |
MultiDiscrete |
MultiBinary |
Multi Processing |
---|---|---|---|---|---|---|
A2C1 | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DDPG1 | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ |
DQN1 | ❌ | ❌ | ✔️ | ❌ | ❌ | ✔️ |
PPO1 | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
QR-DQN2 | ❌ | ❌ | ✔️ | ❌ | ❌ | ✔️ |
SAC1 | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ |
TD31 | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ |
TQC2 | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ |
TRPO2 | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
RARL3 | ❌ | ✔️ | ❌ | ❌ | ❌ | ❌ |
1: Implemented in SB3 GitHub repository.
2: Implemented in SB3 Contrib GitHub repository.
3: Implemented by this GitHub repository.
To train a respective RL-agent, simply run the following code:
python scripts/train_adversary.py --algo algo_name --env env_id
[1] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. “Robust Adversarial Reinforcement Learning.” In: arXiv:1703.02702 [cs] (Mar. 2017). arXiv: 1703.02702. URL: http://arxiv.org/abs/1703.02702
[2] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. “Stable-Baselines3: Reliable Reinforcement Learning Implementations.” In: Journal of Machine Learning Research 22.268 (2021), pp. 1–8. URL: http://jmlr.org/papers/v22/20-1364.html.
[3] Antonin Raffin. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo. 2020