Khrylx/PyTorch-RL

about the kl

Closed this issue · 3 comments

def get_kl(self, x):
    action_prob1 = self.forward(x)
    action_prob0 = action_prob1.detach()
    kl = action_prob0 * (torch.log(action_prob0) - torch.log(action_prob1))
    return kl.sum(1, keepdim=True)

Shouldn't kl be two different strategies? There action_prob1 == action_prob0?? Thank you

In the TRPO paper, the Hessian of the KL is computed at \theta = \theta_{old}. So the two action probs are the same in value, but the action_prob0 representing \pi_old is detached so that no gradient will flow from it.
image

Thank you very much!

Thank you very much!