I found the current implementation of Soft Actor Critic on continuous action space is somewhat complicated, which is hard to get start.
And this is a clean and robust Pytorch implementation of SAC on continuous action space. Here is the result:
All the experiments are trained with same hyperparameters recommended by Haarnoja et al.
The gif below is a short record of the performance on BipedalWalkerHardcore-v3:
gym==0.18.3
box2d==2.3.10
numpy==1.21.2
pytorch==1.8.1
tensorboard==2.5.0
run 'python main.py', where the default enviroment is Pendulum-v0.
If you want to train on different enviroments, just run 'python main.py --EnvIdex 0'.
The --EnvIdex can be set to be 0~5, where
'--EnvIdex 0' for 'BipedalWalker-v3'
'--EnvIdex 1' for 'BipedalWalkerHardcore-v3'
'--EnvIdex 2' for 'LunarLanderContinuous-v2'
'--EnvIdex 3' for 'Pendulum-v0'
'--EnvIdex 4' for 'Humanoid-v2'
'--EnvIdex 5' for 'HalfCheetah-v2'
P.S. if you want train on 'Humanoid-v2' or 'HalfCheetah-v2', you need to install MuJoCo first.
run 'python main.py --EnvIdex 1 --write False --render True --Loadmodel True --ModelIdex 2800000', which will render the 'BipedalWalkerHardcore-v3'.
You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'
For more details of Hyperparameter Setting, please check 'main.py'
Soft Actor-Critic Algorithms and Applications