About log likelihood of data point (logpX)

Question

About log likelihood of data point (logpX)

Meng-Wei opened this issue 5 years ago · 2 comments

Hi, there! I am trying to calculate the log-likelihood of data points.

To do so, I directly modified the provided loss function:

Then, I tested it on the cifar-10 dataset, and got the following:

This result is similar to the one in the paper: https://arxiv.org/abs/1810.09136 "DO DEEP GENERATIVE MODELS KNOW WHAT THEY DON’T KNOW", so I assumed the "logpX" function is correct.

[Problem: ]
The question is, when I sampled from cifar-10, it seems that the sampled images have overall higher log-likelihood than the real images: (std here has unit 0.1, i.e. 9 implies 0.9. sry for ambiguity)

I am not sure why this happens. Is this a bug, or it is desired? Thank you in advance!

Answer 1 · 2019-09-21T12:32:39.000Z

Moreover, is "logpX" - "logpZ" = "logpDet"?
And should "logpDet" be the same (or almost the same) for different datapoints?
Thank you

Answer 2 · 2019-10-19T18:20:47.000Z

I am not an author of Glow, but my understanding is that you are sampling with temperature, using density p(x)^(1/T^2) instead of p(x). The std that you mention is presumably this T parameter. The effect of sampling with temperature is that higher likelihood samples are favored. I think you will get the desired histogram with std=1.

As for the second question, the formula is true, and logpDet shouldn't be equal for different datapoints. I know from doing an exercise with flows on 2D data, that there it was different by a large margin for the dataset I had. I don't know whether it is the same for high-dimensional datasets, but my guess would be yes.