Do the policy network like in cleanrl
Closed this issue · 1 comments
theovincent commented
This is not done...
emasquil commented
We ended up doing something different, we don't treat the std as a parameter, but predict it with a full network
Closed this issue · 1 comments
This is not done...
We ended up doing something different, we don't treat the std as a parameter, but predict it with a full network