FID on CelebA 64x64

Question

FID on CelebA 64x64

qianlong0502 opened this issue 8 months ago · 1 comments

Would you like to share how to generate .npz file for evaluating performance on CelebA 64x64? When evaluating CelebA 64x64, you claimed in paper that you used full training set but you said you randomly picked up 50k images from training set on github readme, which is confusing. Can you show more details?

Here is how I did.

First of all, I generate .npz file as reference:

import numpy as np
from PIL import Image
import random
import math
import blobfile as bf
from tqdm import tqdm


def _list_image_files_recursively(data_dir):
    results = []
    for entry in sorted(bf.listdir(data_dir)):
        full_path = bf.join(data_dir, entry)
        ext = entry.split(".")[-1]
        if "." in entry and ext.lower() in ["jpg", "jpeg", "png", "gif"]:
            results.append(full_path)
        elif bf.isdir(full_path):
            results.extend(_list_image_files_recursively(full_path))
    return results


def center_crop_arr(pil_image, image_size):
    # We are not on a new enough PIL to support the `reducing_gap`
    # argument, which uses BOX downsampling at powers of two first.
    # Thus, we do it by hand to improve downsample quality.
    while min(*pil_image.size) >= 2 * image_size:
        pil_image = pil_image.resize(
            tuple(x // 2 for x in pil_image.size), resample=Image.BOX
        )

    scale = image_size / min(*pil_image.size)
    pil_image = pil_image.resize(
        tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC
    )

    arr = np.array(pil_image)
    crop_y = (arr.shape[0] - image_size) // 2
    crop_x = (arr.shape[1] - image_size) // 2
    return arr[crop_y : crop_y + image_size, crop_x : crop_x + image_size]

local_images = _list_image_files_recursively("./img_align_celeba")
len(local_images)

num_samples = 50_000
random_indices = random.sample(range(len(local_images)), num_samples)
assert len(set(random_indices)) == num_samples
resolution = 64
arrs = []
for idx in tqdm(random_indices):
    path = local_images[idx]
    with bf.BlobFile(path, "rb") as f:
        pil_image = Image.open(f)
        pil_image.load()
    pil_image = pil_image.convert("RGB")

    arr = center_crop_arr(pil_image, resolution)

    arrs.append(arr)

arrs = np.stack(arrs)
np.savez("celeba64_50k.npz", arrs)

Then I run the evaluation script:

python evaluations/evaluator.py celeba64_50k.npz 1000t_50000x64x64x3.npz

Here 1000t_50000x64x64x3.npz stores samples generated by your CelebA 64x64 ADM-IP.pt using 1000 steps. The result is:

Inception Score: 3.248073101043701
FID: 9.09117351570518
sFID: 37.492244654769365
Precision: 0.45012
Recall: 0.61274

Can you help me figure out where I did wrong? I will be grateful about that.

Answer 1 · 2024-04-21T07:30:44.000Z

My bad. After checking your scripts and readme in ./datasets, I got the correct FID:

Inception Score: 3.2480759620666504
FID: 1.5056962961214708
sFID: 3.43160755706333
Precision: 0.67378
Recall: 0.6176486557189325

However, this raise another issue. As you can see, I got 9.09 for FID before when I use PIL.Image.resize, this is what DiffusionVAE did. It seems that resize functions of cv2 and PIL yield different results.