emasquil/ppo

Do the policy network like in cleanrl

Closed this issue · 1 comments

This is not done...

We ended up doing something different, we don't treat the std as a parameter, but predict it with a full network