This SAC code is modified upon https://github.com/rail-berkeley/softlearning, where we drop the ray-based training style to a easy-reading run on a single process. Expert performances are run using this code.
SAC, 50 expert traj, Deterministic Policy in testing
Envs | Mean | Std |
---|---|---|
Pendulum | -147.5398 | 81.7622 |
InvertedPendulum | 1000.0000 | 0.0000 |
InvertedDoublePendulum | 9358.7842 | 0.3963 |
Ant | 5404.5532 | 1520.4961 |
Hopper | 3402.9494 | 446.4877 |
Humanoid | 6043.9907 | 726.1788 |
HalfCheetah | 13711.6445 | 111.4709 |
Walker2d | 5639.3267 | 29.9715 |
Swimmer | 139.2806 | 1.1204 |
AntSlim | 5418.8721 | 946.7947 |
HumanoidSlim | 5346.6181 | 712.2214 |
SwimmerSlim | 339.2811 | 0.7625 |
P.S.: *Slim envs are those envs that use a wrapper who remove some dimension of the observation.