Confused about PPO update

Question

Confused about PPO update

Opened this issue 3 months ago · 0 comments

I'm a bit confused about the PPO update process. In the line 110:

The rewards in a single episode are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.