KL Divergence
random-user-x opened this issue · 3 comments
random-user-x commented
Line 71 in 5b7ca5d
Shouldn't the code be like F.kl_div(distribution, ref_distribution, size_average=False). Why there is a log of the distribution.
Kaixhin commented
It is a bit strange but F.kl_div
takes log probabilities for the input and probabilities for the target.
random-user-x commented
I see this. It seems a bit strange to me though. Btw could you just clear one more doubt of mine.
The paper https://arxiv.org/pdf/1611.01224.pdf specifies KL divergence(shared param, actual parameter). Now if you check the wiki page https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition , P should be shared parameter and Q should be the actual parameter. Is the code following the same norm?