A2C training of Relational Deep Reinforcement Learning Architecture

Introduction

Torch implementation of the deep relational architecture from the paper "Relational Deep Reinforcement Learning" together with (synchronous) advantage-actor-critic training as discussed for example here.

The Box-World environment used in this script can be found at this repo.

Training is performed in a2c_fast.py. The implementation is based on this repo which turned out to be more clever and substantially faster than my own implementation a2c_dist.py. However this latter file contains routines to plot the gradients in the network and the computation graph.

The relational module and general architecture are both implemented as torch.nn.Module in attention_module.py. However, a2c_fast.py uses almost identical adaptations of these classes in helper/a2c_ppo_acktr/model.yml that comply with the training algorithm's Policy class.

An example YAML config file parsed from the arguments is configs/exmpl_config.yml. Training, the environment and network can be parameterized there. A copy of the loaded configuration file will be saved with checkpoints and logs for documentation.

A suitable environment can be created e.g. by conda env create -f environment.yml or pip install -r requirements.txt. Afterwards install and register the Box-World environment by cloning the repo and pip install -e gym-boxworld. Remember that after changing the code you need to re-register the environment before the changes become effective. You can find the details of state space, action space and reward structure there.

visualize_results.ipynb contains some plotting functionality.

Example Run

python a2c.py -c configs/exmpl_config.yml -s example_run

hejichao2020/DRRL

A2C training of Relational Deep Reinforcement Learning Architecture

Introduction

Example Run