problem on the function of new_cal_re() in fullpace_env.py
Opened this issue · 0 comments
WAYKEN-TSE commented
i know that the this function is used to calculate the extrinsic reward, but when doing PPO to update the network, the advantage function only include the intrinsic reward(advantages = rollouts.returns[:-1] - rollouts.value_preds[:-1]),then how can the extrinsic reward influence the policy network and what does this function do