problem on the function of new_cal_re() in fullpace_env.py

Question

problem on the function of new_cal_re() in fullpace_env.py

Opened this issue 2 years ago · 0 comments

i know that the this function is used to calculate the extrinsic reward, but when doing PPO to update the network, the advantage function only include the intrinsic reward(advantages = rollouts.returns[:-1] - rollouts.value_preds[:-1]),then how can the extrinsic reward influence the policy network and what does this function do