Kaixhin/ACER

KL Divergence

random-user-x opened this issue · 3 comments

ACER/train.py

Line 71 in 5b7ca5d

kl = F.kl_div(distribution.log(), ref_distribution, size_average=False)

Shouldn't the code be like F.kl_div(distribution, ref_distribution, size_average=False). Why there is a log of the distribution.

It is a bit strange but F.kl_div takes log probabilities for the input and probabilities for the target.

I see this. It seems a bit strange to me though. Btw could you just clear one more doubt of mine.

The paper https://arxiv.org/pdf/1611.01224.pdf specifies KL divergence(shared param, actual parameter). Now if you check the wiki page https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition , P should be shared parameter and Q should be the actual parameter. Is the code following the same norm?

Ah well spotted - the code is wrong. I've fixed this in 11eb611 - thanks for spotting!