Incorrect Imagenet dataset

Question

Incorrect Imagenet dataset

Opened this issue 2 years ago · 1 comments

I downloaded the Imagenet dataset linked in this repo, and I think the dataset for it (50000 test images with labels, box downsampling) doesn't match the official Imagenet 32x32/64x64 versions used for NLL benchmarks (https://github.com/openai/vdvae/blob/main/setup_imagenet.sh, 49999 test images with no labels, can download from https://academictorrents.com/details/96816a530ee002254d29bf7a61c0c158d3dedc3b). Difference in downsampling method used during pre-processing will make the NLL's not comparable.

Answer 1 · 2022-06-08T17:51:49.000Z

Hello @prafullasd

Thank you for the interest you show in this work and thanks for reaching out about his issue!

You bring up a very good point. We have had our skepticism about the results achieved on Imagenet datasets in our work when the NLL results were very different from the VDVAE baseline. The confusing part about all of this is that the Imagenet version used in NLL benchmarks used to be hosted on this website (as you pointed out by the VDVAE download script), and it seems that since the update of that website, the downsampled imagenet is now downsampled in a different method (this is the part we missed). To add to that confusion even more, some prior work also seems to use the incorrect Imagenet version.

For completeness and future reference:

Original Imagenet 32x32 used in NLL benchmarks: https://academictorrents.com/details/bf62f5051ef878b9c357e6221e879629a9b4b172
Original Imagenet 64x64 used in NLL benchmarks:
https://academictorrents.com/details/96816a530ee002254d29bf7a61c0c158d3dedc3b

NLL reported results on Imagenet will probably change when we use these two versions of the dataset. We will update these metrics as we re-do the experiments (will take some time). The general expectation is that our reported NLL should get closer to the NLL reported by VDVAE, which would make sense and match our findings on FFHQ 5-bits.

While noting this mistake is very important and will ensure full research correctness, it doesn't affect the main contribution of the paper much: "Efficient-VDVAE keeps comparable or better NLL performance with less memory and faster training". Nevertheless, being precise in reporting the results is very much desirable.

Thank you very much for pointing this out and helping improve the quality of our work!
I will keep the issue open until we make our updates.
If you find any other problems, please let us know, we appreciate this a lot!

Rayhane.