Reward modification in PPO
Ynjxsjmh opened this issue · 2 comments
Ynjxsjmh commented
DeepRL-TensorFlow2/PPO/PPO_Discrete.py
Lines 151 to 154 in 876266d
DeepRL-TensorFlow2/PPO/PPO_Continuous.py
Lines 167 to 170 in 876266d
In PPO_Discrete
each reward is multiplied by 0.01
and in PPO_Continuous
reward is also modified. I don't understand why do these modification, what does these modification do?
boogalooYison commented
same question
huojitiaotiao commented
乘0.01应该是减小奖励,使其保持在0-1之间(我猜测)