about the kl

Question

about the kl

Closed this issue 4 years ago · 3 comments

yangyiqin-tsinghua commented 4 years ago

def get_kl(self, x):
    action_prob1 = self.forward(x)
    action_prob0 = action_prob1.detach()
    kl = action_prob0 * (torch.log(action_prob0) - torch.log(action_prob1))
    return kl.sum(1, keepdim=True)

Shouldn't kl be two different strategies？ There action_prob1 == action_prob0?? Thank you

Answer 1 · 2020-04-22T16:45:09.000Z

In the TRPO paper, the Hessian of the KL is computed at \theta = \theta_{old}. So the two action probs are the same in value, but the action_prob0 representing \pi_old is detached so that no gradient will flow from it.

Answer 2 · 2020-05-21T04:22:13.000Z

Thank you very much!

Answer 3 · 2020-05-21T04:22:25.000Z

Thank you very much!