opendilab/PPOxFamily

Dual clipping parameter

jviquerat opened this issue · 2 comments

Hi,

I might be wrong, but I believe the dual clipping parameter is intended to be > 1:

Screenshot 2024-04-26 at 18 27 03

policy_loss = ppo_dual_clip(logp_new, logp_old, adv, 0.2, 0.2)

This is indeed our problem, here the dual_clip value should be greater than 1.0. We are sorry about the problem when migrating code from DI-engine (link). We will fix this problem soon.

We have fixed this problem in #96.