inference unetref generator
concrete13377 opened this issue · 7 comments
trying to run with unetref generator checkpoint trained with config python3 scripts/gen_single_image_diffusion.py \ --model-in-file latest_net_G_A.pth \ --img-in viton_bbox_ref/testA/imgs/00006_00.jpg \ --mask-in viton_bbox_ref/testA/ref/00006_00.jpg \ --dir-out checkpoints/viton_bbox_ref/inference_output \ --img-width 128 \ --img-height 128
getting the following error
warnings.warn(
Dual U-Net: number of ref blocks: 15
sampling loop time step: 0%| | 0/1000 [00:00<?, ?it/s]
0%| | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/joliGEN/scripts/gen_single_image_diffusion.py", line 808, in <module>
frame, lmodel, lopt = generate(**vars(args))
^^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/scripts/gen_single_image_diffusion.py", line 563, in generate
out_tensor, visu = model.restoration(
^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/diffusion_generator.py", line 95, in restoration
return self.restoration_ddpm(
^^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/diffusion_generator.py", line 149, in restoration_ddpm
y_t = self.p_sample(
^^^^^^^^^^^^^^
File "/joliGEN/models/modules/diffusion_generator.py", line 253, in p_sample
model_mean, model_log_variance = self.p_mean_variance(
^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/diffusion_generator.py", line 219, in p_mean_variance
noise=self.denoise_fn(
^^^^^^^^^^^^^^^^
File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/palette_denoise_fn.py", line 109, in forward
out = self.model(input, embedding, ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1605, in forward
h, hs, emb, h_ref, hs_ref = self.compute_feats(
^^^^^^^^^^^^^^^^^^^
File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1595, in compute_feats
h, _ = module(h, emb, qkv_ref=qkv_list.pop(0))
^^^^^^^^
UnboundLocalError: cannot access local variable 'qkv_list' where it is not associated with a value```
Hi @concrete13377 thanks for reporting this, I can reproduce it. There's a flag and input needed. I'll come back with a fix.
See #569
The PR allows you to generate image with reference input:
python3 gen_single_image_diffusion.py --model-in-file /path/to/model/latest_net_G_A.pth --img-in viton_bbox_ref/testA/imgs/00006_00.jpg --bbox-in viton_bbox_ref/testA/bbox/00006_00.txt --ref-in viton_bbox_ref/testA/ref/00006_00.jpg --dir-out /path/to/out/ --img-width 128 --img-height 128
You want to look at the result /path/to/out/img_0_generated_crop.png
. (The img_0_generated.png
image is incorrect in this case since the model from documentation is trained from 512x512 crops that contain the garment bbox, so the model never sees heads, etc...).
thank you so much for your work
what do you mean by "the model never sees heads, etc"? how can I train the model to get correct generated image?
so there's no way to use a model trained with example config since it's wrong about resolution? or can I just run it with other options so that it generates correct images?
The example model lacks the full context. You can try to hack a crop at inference, but I don´t see how this would help much.
However, you can finetune your existing model with the --data_online_creation_load_size_A 768 1024
and --train_continue
options. This would prevent from retraining from scratch.