about the kl
Closed this issue · 3 comments
yangyiqin-tsinghua commented
def get_kl(self, x):
action_prob1 = self.forward(x)
action_prob0 = action_prob1.detach()
kl = action_prob0 * (torch.log(action_prob0) - torch.log(action_prob1))
return kl.sum(1, keepdim=True)
Shouldn't kl be two different strategies? There action_prob1 == action_prob0?? Thank you
Khrylx commented
yangyiqin-tsinghua commented
Thank you very much!
yangyiqin-tsinghua commented
Thank you very much!