Opened this issue 2 years ago · 0 comments
When testing over some continuous action space environments, such as AntBulletEnv-v0. Stable-baseline3 outperforms the current PPO.
Here is a figure of the result on Humanoid.