Example A2C implementation with ReLAx
This repository contains an implementation of advantage actor critic (A2C) with ReLAx.
A2C actor was trained on LunarLander-v2 Gym environment for 4m env-steps.
The graph of average return vs training step is shown below (batch_size=40000
):
Resulting Policy: