Questions about normalizing images in training.

Question

Questions about normalizing images in training.

zjlww opened this issue a year ago · 2 comments

I realized that you normalize the images during training. Is this common in training and evaluating image generative models? I don't think other models (LDM, DDPM, etc.) do this during training. Is the FID comparison in your paper still fair with this normalization in place?

train_transform = transforms.Compose(
    [
        transforms.Resize(args.image_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ]
)

Answer 1 · 2024-01-19T04:03:23.000Z

Hi,

During the training stage, we normalized input images to be in the range [-1, 1]. This is the standard practice in image generation. If you double check the code of LDM, they adopt taming.data.imagenet.ImagePaths (refer to this link: https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/data/base.py#L51) to load input images and normalize them to [-1, 1].

However, in the sampling stage, we rescale the output images to be in the range [0, 1] before saving. You can check it out here.

Therefore, it still affirms the fairness of FID comparison.

Thanks.

Answer 2 · 2024-01-19T07:32:03.000Z

Thank you so much with the prompt and detailed reply!