ReLAx
Example SAC implementation withThis repository contains an implementation of soft actor critic (SAC) with ReLAx.
SAC actor was trained on Hopper-v2 Mujoco Gym environment for 1m env-steps.
The graph of average return vs environment step is shown below (logs done every 10k steps):
The distribution of estimated Q-values vs data Q-values is shown below:
Resulting Policy: