forever208/DDPM-IP

CelebA dataset

Closed this issue · 18 comments

Hi I'm using the following code for celebA data generation, but the result is very poor with fid>10, am I wrong somewhere?

torchrun --nproc_per_node=8 --master_port=33456 scripts/image_sample.py \
--image_size 64 --timestep_respacing 100 \
--model_path ./ckpt/DDPM_IP_celeba64.pt \
--use_fp16 False --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --batch_size 256 --num_samples 50000  --sample_dir ./celeba_sample

I observed the generated images and the quality is quite good. Perhaps you could provide the sampling settings on this dataset? My default is consistent with the LSUN dataset you provided.
The CelebA dataset I use is the following link. (img_align_celeba.zip 1.34G). Is that right?
https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg

I observed the generated images and the quality is quite good. Perhaps you could provide the sampling settings on this dataset? My default is consistent with the LSUN dataset you provided. The CelebA dataset I use is the following link. (img_align_celeba.zip 1.34G). Is that right? https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg

@RachelTeamo yes, that is correct, but you need to do the dataset processing to resize it into 64x64 using the celeba64_npz.py script. I guess the poor FID was due to the reference batch without dataset processing

OK, I will have a check. Thanks for your suggestion.

I tried your suggestion and your model got the correct FID by direct inference, but I found that the FID is still very large when I re-trained it, I would like to ask you when you trained this data is the data that you downloaded directly from the official website and unzipped it? Or does it need other processing?

@RachelTeamo I just did two steps (First download the raw CelebA from GoogleDrive, then use our celeba64_npz.py script to do the preprocessing.) as I stated here. What is your FID after retraining?

Do you use the zip file directly, or do you decompress your npz file again to get the image for training?

Currently I'm getting fid>10 with 100 samples from a 150K trained model.

@RachelTeamo It does not matter using npz or image folder to do training.

100 samples are too few to compute FID which is very sensitive to num_samples, try with 50k samples

I apologize for expressing myself incorrectly, I meant that I sampled at a rate of 50K samples that. Here's the script I sampled

torchrun --nproc_per_node=8 --master_port=23456 scripts/image_sample.py \
--image_size 64 --timestep_respacing 100 \
--model_path ./DDPM-IP/ema_0.9999_150000.pt \
--use_fp16 False --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --batch_size 256 --num_samples 50000 --sample_dir ./ours_sample

I tried your model (ADM-IP.pt) with the same sample script, and I reproduced your result. (100 timestep_respacing, fid<3).
Also, here's my training script

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=2456 scripts/image_train.py --input_pertub 0.1 \
--data_dir ../dataset/img_align_celeba \
--image_size 64 --use_fp16 False --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True  --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 32 --log_dir ./DDPM-IP

I tested the 250K trained model. The FID is still larger than 10....

@RachelTeamo except for use_fp16=False, I do not think your training setting is problematic. One possibility is the mismatch between the training dataset and the reference batch of FID computation. Make sure you use the whole training set as the reference batch to compute FID

Another suggestion is that you try with ADM baseline training, for example, train 100k steps and see if you can get a FID around 3

One more thing I would like to confirm with you is that your training set is the img_align_celeba.zip (1.34G) file downloaded with a total of 202599 jpg images. Thanks again~

@RachelTeamo Yes, you can see from my code here

I tried the ADM model but my FID is still greater than 10, I'll list my steps and you see if there is a problem:

  1. Download img_align_celeba.zip (1.34G) and unzip it, the path to unzip is the path given to the dataset during training.
  2. Run this code (https://github.com/forever208/DDPM-IP/blob/DDPM-IP/datasets/celeba64_npz.py) to get the ref_batch needed to compute the fid. i.e . /celeba64_train.npz.
  3. Follow the above script for training and sampling to get samples_50000x64x64x3.npz
  4. Then I get fid by the following step.
python evaluator.py \
../datasets/celeba64_train.npz \ 
./samples_50000x64x64x3.npz

This doesn't seem to be a problem to me does it, and I think the possible reason for this could be that it requires me to rezip celeba64_train.npz to get the jpg for training?

@RachelTeamo I do not think your implementation has any errors. I used npz file for training and FID computation in Celeba as well, it should not be a problem.

Can you try to train the ADM baseline on Cifar-10 dataset? because it is much faster to ensure the code or the machine is ok.

I actually tried CiFar10 and got normal results. The visualization came out fine on the celeba dataset as well, so I was surprised. And using your model for inference, I also got the correct FID so I find it strange that I used a 4090 graphics card, pytorch 1.13.0 for training and it seems to be fine. And using your model for inference also gives the correct FID. this shows that the inference code and ref-batch should be correct. I used a 4090 , pytorch 1.13.0 for training and there seems to be no problem here either. The strangest thing for me is that the visualization results are also normal, so I really don't know why.

The last trial I recommend is using DDIM code to download and process the celeba64 dataset, their code is here. Good luck

I switched servers and it turned out to be correct, I'm guessing some server issues may have caused this. Thanks again for bothering you for so long.

I switched servers and it turned out to be correct, I'm guessing some server issues may have caused this. Thanks again for bothering you for so long.

@RachelTeamo Hello, I encountered the same issue with high FID. While training the CIFAR-10 dataset, I obtained an FID of 9.8 at 460k steps, which is much higher than the expected value of 2.3. I would like to know how you resolved this issue.