/relax_td3_example

Example TD3 implementation with ReLAx

Primary LanguageJupyter Notebook

Example TD3 implementation with ReLAx

This repository contains an implementation of twin delayed deep deterministic policy gradient (TD3) with ReLAx.

TD3 actor was trained on Walker2d-v2 Mujoco Gym environment for 1m env-steps.

The graph of average return vs environment step is shown below (logs done every 10k steps):

td3_training

The distribution of estimated Q-values vs data Q-values is shown below:

td3_q_func

Resulting Policy:

td3_run.mp4