Inference error on Imagenet512:- ValueError: not enough values to unpack (expected 4, got 3)
Closed this issue · 3 comments
Hi,
I am getting this error while running the inference with imagenet512 model
Inference command
python main.py --config_file configs/imagenet512.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim
This is the error I am getting
(copaint) CoPaint$ python main.py --config_file configs/imagenet512.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim
/home/styldod/anaconda3/envs/copaint/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /home/styldod/anaconda3/envs/copaint/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
warn(f"Failed to load image Python extension: {e}")
WARNING:root:Tensorflow not installed!
WARNING:root:Scikit-learn not installed!
WARNING:root:Logging level higher than INFO!
{
"algorithm": "o_ddim",
"attention_resolutions": "32,16,8",
"channel_mult": "",
"class_cond": true,
"classifier_attention_resolutions": "32,16,8",
"classifier_depth": 2,
"classifier_path": "./checkpoints/512x512_classifier.pt",
"classifier_pool": "attention",
"classifier_resblock_updown": true,
"classifier_scale": 1.0,
"classifier_use_fp16": false,
"classifier_use_scale_shift_norm": true,
"classifier_width": 128,
"clip_denoised": true,
"cond_y": null,
"dataset_ending_index": -1,
"dataset_name": "imagenet",
"dataset_starting_index": -1,
"ddim": {
"ddim_sigma": 0.0,
"schedule_params": {
"ddpm_num_steps": 250,
"jump_length": 1,
"jump_n_sample": 1,
"num_inference_steps": 100,
"schedule_type": "linear",
"time_travel_filter_type": "none",
"use_timetravel": false
}
},
"ddnm": {
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
}
},
"ddrm": {
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
}
},
"debug": false,
"diffusion_steps": 1000,
"dps": {
"eta": 1.0,
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
},
"step_size": 0.5
},
"dropout": 0.0,
"image_size": 512,
"input_image": "examples/celeb-sample.jpg",
"learn_sigma": true,
"lr_kernel_n_std": 2,
"mask": "examples/celeb-mask.jpg",
"mask_type": "half",
"mode": "inpaint",
"model_path": "./checkpoints/512x512_diffusion.pt",
"n_iter": 1,
"n_samples": 1,
"noise_schedule": "linear",
"num_channels": 256,
"num_head_channels": 64,
"num_heads": 4,
"num_heads_upsample": -1,
"num_res_blocks": 2,
"num_samples": 100,
"optimize_xt": {
"coef_xt_reg": 0.01,
"coef_xt_reg_decay": 1.0,
"filter_xT": false,
"lr_xt": 0.0025,
"lr_xt_decay": 1.05,
"mid_interval_num": 1,
"num_iteration_optimize_xt": 5,
"optimize_before_time_travel": false,
"optimize_xt": true,
"use_adaptive_lr_xt": true,
"use_smart_lr_xt_decay": false
},
"outdir": "images/example",
"predict_xstart": false,
"repaint": {
"inpa_inj_sched_prev": true,
"inpa_inj_sched_prev_cumnoise": false,
"schedule_jump_params": {
"jump_length": 10,
"jump_n_sample": 10,
"n_sample": 1,
"t_T": 250
}
},
"resample": {
"keep_n_samples": 2
},
"resblock_updown": true,
"rescale_learned_sigmas": false,
"rescale_timesteps": false,
"respace_interpolate": false,
"resume": false,
"scale": 0,
"seed": 42,
"show_progress": true,
"timestep_respacing": "250",
"use_checkpoint": false,
"use_ddim": false,
"use_fp16": true,
"use_git": false,
"use_kl": false,
"use_new_attention_order": false,
"use_scale_shift_norm": true
}
2023-05-19-12:03:23-root-INFO: Prepare model...
2023-05-19-12:03:25-root-INFO: Loading model from ./checkpoints/512x512_diffusion.pt...
2023-05-19-12:03:27-root-INFO: Prepare classifier...
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/styldod/anaconda3/envs/copaint/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
2023-05-19-12:03:28-root-INFO: Start sampling
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 334, in <module>
main()
File "main.py", line 203, in main
image, mask, image_name, class_id = data
ValueError: not enough values to unpack (expected 4, got 3)
Hi Vineet,
Thank you for your interest in our paper.
By default, the ImageNet model that we use requires a class_id for classifier guidance, and the error is caused by a missing class_id. One solution is to use the CelebA model, by simply running
python main.py --config_file configs/celebahq.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim
If you insist on using the ImageNet model, you could remove the classifier guidance by
python main.py --config_file configs/imagenet512.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim --no-class_cond
Or you could manually set a class id by modifying the code a little bit. Honestly, I do not recommend using the ImageNet model for inpainting a human face (celeba-sample.jpg), as the distributions of CelebA and Imagenet differ a lot, and thus the inpainting results might be bad.
Let me know if you have any other questions.
Regards,
Guanhua
Agreed, the CelebA trained model is more suitable for face-related inpainting. I am trying to see the use of imagenet trained model as object removal in images.
I was just first checking whether the inference works or not and used the only image/mask pair present in the repo.
As per your suggestion, i run the inference using this command:-
python main.py --config_file configs/imagenet512.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim --no-class_cond
I am getting another error while loading checkpoint
UserWarning: Failed to load image Python extension: /home/styldod/anaconda3/envs/copaint/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
warn(f"Failed to load image Python extension: {e}")
WARNING:root:Tensorflow not installed!
WARNING:root:Scikit-learn not installed!
WARNING:root:Logging level higher than INFO!
{
"algorithm": "o_ddim",
"attention_resolutions": "32,16,8",
"channel_mult": "",
"class_cond": false,
"classifier_attention_resolutions": "32,16,8",
"classifier_depth": 2,
"classifier_path": "./checkpoints/512x512_classifier.pt",
"classifier_pool": "attention",
"classifier_resblock_updown": true,
"classifier_scale": 1.0,
"classifier_use_fp16": false,
"classifier_use_scale_shift_norm": true,
"classifier_width": 128,
"clip_denoised": true,
"cond_y": null,
"dataset_ending_index": -1,
"dataset_name": "imagenet",
"dataset_starting_index": -1,
"ddim": {
"ddim_sigma": 0.0,
"schedule_params": {
"ddpm_num_steps": 250,
"jump_length": 1,
"jump_n_sample": 1,
"num_inference_steps": 100,
"schedule_type": "linear",
"time_travel_filter_type": "none",
"use_timetravel": false
}
},
"ddnm": {
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
}
},
"ddrm": {
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
}
},
"debug": false,
"diffusion_steps": 1000,
"dps": {
"eta": 1.0,
"schedule_jump_params": {
"jump_length": 1,
"jump_n_sample": 1,
"n_sample": 1,
"t_T": 250
},
"step_size": 0.5
},
"dropout": 0.0,
"image_size": 512,
"input_image": "examples/celeb-sample.jpg",
"learn_sigma": true,
"lr_kernel_n_std": 2,
"mask": "examples/celeb-mask.jpg",
"mask_type": "half",
"mode": "inpaint",
"model_path": "./checkpoints/512x512_diffusion.pt",
"n_iter": 1,
"n_samples": 1,
"noise_schedule": "linear",
"num_channels": 256,
"num_head_channels": 64,
"num_heads": 4,
"num_heads_upsample": -1,
"num_res_blocks": 2,
"num_samples": 100,
"optimize_xt": {
"coef_xt_reg": 0.01,
"coef_xt_reg_decay": 1.0,
"filter_xT": false,
"lr_xt": 0.0025,
"lr_xt_decay": 1.05,
"mid_interval_num": 1,
"num_iteration_optimize_xt": 5,
"optimize_before_time_travel": false,
"optimize_xt": true,
"use_adaptive_lr_xt": true,
"use_smart_lr_xt_decay": false
},
"outdir": "images/example",
"predict_xstart": false,
"repaint": {
"inpa_inj_sched_prev": true,
"inpa_inj_sched_prev_cumnoise": false,
"schedule_jump_params": {
"jump_length": 10,
"jump_n_sample": 10,
"n_sample": 1,
"t_T": 250
}
},
"resample": {
"keep_n_samples": 2
},
"resblock_updown": true,
"rescale_learned_sigmas": false,
"rescale_timesteps": false,
"respace_interpolate": false,
"resume": false,
"scale": 0,
"seed": 42,
"show_progress": true,
"timestep_respacing": "250",
"use_checkpoint": false,
"use_ddim": false,
"use_fp16": true,
"use_git": false,
"use_kl": false,
"use_new_attention_order": false,
"use_scale_shift_norm": true
}
2023-05-22-15:59:50-root-INFO: Prepare model...
2023-05-22-15:59:51-root-INFO: Loading model from ./checkpoints/512x512_diffusion.pt...
Traceback (most recent call last):
File "main.py", line 334, in <module>
main()
File "main.py", line 164, in main
unet, sampler = prepare_model(config.algorithm, config, device)
File "main.py", line 59, in prepare_model
unet.load_state_dict(
File "/home/styldod/anaconda3/envs/copaint/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
Unexpected key(s) in state_dict: "label_emb.weight".
Hi Vineet,
By default, the UNet we use requires class_id as input. With class_cond=False, part of the weights (e.g. "label_emb.weight" as you reported) would be removed and thus cannot be loaded. I made some modifications to the code for enabling non-strict model weights loading, and you could pull the repo and use the following command to run the experiment:
python main.py --config_file configs/imagenet.yaml --input_image examples/celeb-sample.jpg --mask examples/celeb-mask.jpg --outdir images/example --n_samples 1 --algorithm o_ddim --no-class_cond
However, in my trial, I noticed that the resulting image is of terrible performance due to the image distribution shift and the lack of class condition, so I suggest using this command only for debugging purpose.
Let me know if there is any other problems.
Regards,
Guanhua