PPO随机策略

Question

Opened this issue 6 years ago · 0 comments

请问对于连续控制任务，如果可选的动作action有多个（假设6个），PPO采用随机策略其actor最后一层的输出是什么？