For continuous action space environments, the performance is far behind that stable-baseline3

Question

Opened this issue 2 years ago · 0 comments

When testing over some continuous action space environments, such as AntBulletEnv-v0. Stable-baseline3 outperforms the current PPO.

Here is a figure of the result on Humanoid.