Question about FID score
LSH1231234 opened this issue · 4 comments
Hi! Thank you for the awesome work and providing codes.
I have a somewhat simple question about measuring FID score on FFHQ 256x256.
Which one of the two processes below do you recommend to get a great FID score on FFHQ 256x256?
- Generate 1024x1024 resolution images and resize them to 256x256 and calculate the resized 256x256 fake images on FID score.
- Generate images directly to 256x256 resolution and calculate the 256x256 fake images on FID score.
Thank you!
Hi,
Thanks for your interest in our work.
Process 1 can probably give you slightly better results, but process 2 is the more proper way to do it.
Downsample images produces artifacts so the FFHQ 256x256 images have certain artifacts (since they are downsampled from FFHQ 1024x1024). If you use the same downsampling algorithm, then downsampling your 1024x1024 images produces the same type of artifact automatically. Generating directly at 256x256 would require the model to simulate the downsampling artifact. Process 1 should be marginally better from this perspective. Also generating at 1024x1024 requires a much larger model, which is not much of a fair comparison. Another point to consider is the training batch size. Generally, larger batch sizes can give you better results, and the 1024x1024 model is significantly more memory-consuming. It's hard to say which factor can have more impact here, but I think it should be better to just go with the 256x256 model.
Thank you for the answer.
Can I ask you one more simple question?
As you mentioned in the README.md, I converted "original" dataset to "lmdb" dataset and trained the model.
To calculate FID score, I extracted features from fake images generated from "lmdb" dataset.
But when it comes to extract features from real images, is it okay to extract features from "original" dataset (not "lmdb" dataset) and calculate FID score?
Yes, you can. Just remember to use the same downsampling algorithm as when preparing the training dataset (the default used is torchvision.transforms.resize(img, size, Image.LANCZOS)). I recommend using this package pytorch-fid. You can simply pass two paths to tow image collections and it'll do the rest for you.
One hacky but stress free way would be to change this buffer into a file path (you can simply ran a global counter for file name), and comment out the rest of the lines of prepare() (after line48). Then you can just run prepare_data.py to get a resized set of images.
Resolved