chauncygu/Multi-Agent-Constrained-Policy-Optimisation

Repeated calculation

BenfenYU opened this issue · 1 comments

It seems that the commutative variable is recalculated here:

First time:

r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b

Repeated :

r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b

Hi, Benfen, Thanks for comments, We will clean the code again.