/SAC-with-Auto-tuned-Temperature-for-Bipedal-Walking-v3

Reinforcement Learning on Bipedal-Walking-v3 by using modified version of SAC. We have achieved 300 score at episode 117, shows convergence in normal versionat the beginning of the 200 episodes. In hardcore we have achieved 300 rewards at episode 1296.

Primary LanguageJupyter NotebookMIT LicenseMIT

Stargazers