/relax_sac_example

Example SAC implementation with ReLAx

Primary LanguageJupyter Notebook

Example SAC implementation with ReLAx

This repository contains an implementation of soft actor critic (SAC) with ReLAx.

SAC actor was trained on Hopper-v2 Mujoco Gym environment for 1m env-steps.

The graph of average return vs environment step is shown below (logs done every 10k steps):

sac_training

The distribution of estimated Q-values vs data Q-values is shown below:

sac_q_func

Resulting Policy:

sac_run.mp4