SAC_TF2

A simple Tensorflow 2.0 implementation of OpenAI's spinning up Soft Actor Critic.

The spinning up implementation SAC is very compact and easily extensible for quick experiments on other ideas, so I brought it over to tf2.0 for some experiments I want to do. It doesn't have the full logging print outs, instead it has some printing and logging to tensorboard. There are two variants here:

SAC, which is a direct analog to OpenAI's implementation.
Modular, which is a slightly more modular implementation e.g train/test rollouts use a rollout function similar to TF agent's drivers.

With the same inputs and weight initializations I've checked that up to 1000 gradient steps result in identical end weights to 3 decimal places, and this is confirmed by effectively identical performance on Cartpole and Reacher2D.

Also see this link for a colab version of the modular version. https://colab.research.google.com/drive/1QwIThAaK5F-DtV5o36XXP2-_rxWWsv8S

sholtodouglas/SAC_TF2

SAC_TF2

Cartpole-V0 Return vs Environment Steps (takes ~50s)