jolibrain/joliGEN

inference unetref generator

concrete13377 opened this issue · 7 comments

trying to run with unetref generator checkpoint trained with config python3 scripts/gen_single_image_diffusion.py \ --model-in-file latest_net_G_A.pth \ --img-in viton_bbox_ref/testA/imgs/00006_00.jpg \ --mask-in viton_bbox_ref/testA/ref/00006_00.jpg \ --dir-out checkpoints/viton_bbox_ref/inference_output \ --img-width 128 \ --img-height 128

getting the following error

  warnings.warn(
Dual U-Net: number of ref blocks:  15
sampling loop time step:   0%|                                                                                                                                                                               | 0/1000 [00:00<?, ?it/s]
  0%|                                                                                                                                                                                                           | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 808, in <module>
    frame, lmodel, lopt = generate(**vars(args))
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 563, in generate
    out_tensor, visu = model.restoration(
                       ^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 95, in restoration
    return self.restoration_ddpm(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 149, in restoration_ddpm
    y_t = self.p_sample(
          ^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 253, in p_sample
    model_mean, model_log_variance = self.p_mean_variance(
                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 219, in p_mean_variance
    noise=self.denoise_fn(
          ^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/palette_denoise_fn.py", line 109, in forward
    out = self.model(input, embedding, ref)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1605, in forward
    h, hs, emb, h_ref, hs_ref = self.compute_feats(
                                ^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1595, in compute_feats
    h, _ = module(h, emb, qkv_ref=qkv_list.pop(0))
                                  ^^^^^^^^
UnboundLocalError: cannot access local variable 'qkv_list' where it is not associated with a value```

beniz commented

Hi @concrete13377 thanks for reporting this, I can reproduce it. There's a flag and input needed. I'll come back with a fix.

beniz commented

See #569

The PR allows you to generate image with reference input:

python3 gen_single_image_diffusion.py --model-in-file /path/to/model/latest_net_G_A.pth --img-in viton_bbox_ref/testA/imgs/00006_00.jpg --bbox-in viton_bbox_ref/testA/bbox/00006_00.txt --ref-in viton_bbox_ref/testA/ref/00006_00.jpg --dir-out /path/to/out/ --img-width 128 --img-height 128

You want to look at the result /path/to/out/img_0_generated_crop.png. (The img_0_generated.png image is incorrect in this case since the model from documentation is trained from 512x512 crops that contain the garment bbox, so the model never sees heads, etc...).

thank you so much for your work

what do you mean by "the model never sees heads, etc"? how can I train the model to get correct generated image?

beniz commented

Adding --data_online_creation_load_size_A 768 1024 would load the image full size at training time.

For a typical output during training:
image

Images are squared, but you can resize them afterwards easily.

so there's no way to use a model trained with example config since it's wrong about resolution? or can I just run it with other options so that it generates correct images?

beniz commented

The example model lacks the full context. You can try to hack a crop at inference, but I don´t see how this would help much.

However, you can finetune your existing model with the --data_online_creation_load_size_A 768 1024 and --train_continueoptions. This would prevent from retraining from scratch.