yyzpiero/EVO-PopulationBasedTraining

For continuous action space environments, the performance is far behind that stable-baseline3

Opened this issue · 0 comments

When testing over some continuous action space environments, such as AntBulletEnv-v0. Stable-baseline3 outperforms the current PPO.
image

Here is a figure of the result on Humanoid.