forever208/DDPM-IP

ValueError: unsupported image size: 32

ShiningSord opened this issue · 5 comments

Thanks for sharing such a fantastic job.

When I try to train on CIFAR-10 (or any other dataset with image size = 32), I meet the error.

The full information is like this,

Traceback (most recent call last):
File "scripts/image_train.py", line 83, in
main()
File "scripts/image_train.py", line 27, in main
**args_to_dict(args, model_and_diffusion_defaults().keys())
File "/mnt/backup2/home/zxwang22/code/fedif/DDPM-IP/guided_diffusion/script_util.py", line 115, in create_model_and_diffusion
use_new_attention_order=use_new_attention_order,
File "/mnt/backup2/home/zxwang22/code/fedif/DDPM-IP/guided_diffusion/script_util.py", line 158, in create_model
raise ValueError(f"unsupported image size: {image_size}")
ValueError: unsupported image size: 32

I found that in 'DDPM-IP/guided_diffusion/script_util.py', L158, there is no definition when the input size equals 32.

Can you help me with this issue?

Thanks a lot!

Hi @ShiningSord , please check out to the branch cifar_base_noise to train the model on CIFAR10.

I will try to merge the branches later on.

@forever208 Thanks for your kind reply. I will try it today.

Hi, @ShiningSord , I have merged all duplicate branches. Now, the main branch is DDPM-IP

@forever208 Thanks for your help! The code works well on my computer when I only use 1 GPU with the script

 mpiexec -n 1  python3 scripts/image_train.py --input_pertub 0.0  \
--data_dir /mnt/proj74/zxwang/data/cifar-10-batches-py/5user \
--image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \
--attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 64

But when I try to use multiple GPUs (e.g, n=4), I meet a problem like this,

There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
  python3

Either request fewer slots for your application, or make more slots available
for use.

Do you have any suggestions about this? Thank you so much!

@ShiningSord hi, can you share with me 2 details to debug? One is your GPU cluster configuration (how many GPUs in a node), another detail is your arguments setting, particularly you should use mpiexec -n 4 if you wanna train by 4 GPUs.