Reward modification in PPO

Lines 151 to 154 in 876266d

    
           state_batch.append(state) 
        
           action_batch.append(action) 
        
           reward_batch.append(reward * 0.01) 
        
           old_policy_batch.append(probs)

DeepRL-TensorFlow2/PPO/PPO_Continuous.py

Lines 167 to 170 in 876266d

    
           state_batch.append(state) 
        
           action_batch.append(action) 
        
           reward_batch.append((reward+8)/8) 
        
           old_policy_batch.append(log_old_policy)

In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?

same question

乘0.01应该是减小奖励，使其保持在0-1之间（我猜测）

	state_batch.append(state)
	action_batch.append(action)
	reward_batch.append(reward * 0.01)
	old_policy_batch.append(probs)

	state_batch.append(state)
	action_batch.append(action)
	reward_batch.append((reward+8)/8)
	old_policy_batch.append(log_old_policy)