inference unetref generator

Question

inference unetref generator

concrete13377 opened this issue a year ago · 7 comments

trying to run with unetref generator checkpoint trained with config python3 scripts/gen_single_image_diffusion.py \ --model-in-file latest_net_G_A.pth \ --img-in viton_bbox_ref/testA/imgs/00006_00.jpg \ --mask-in viton_bbox_ref/testA/ref/00006_00.jpg \ --dir-out checkpoints/viton_bbox_ref/inference_output \ --img-width 128 \ --img-height 128

getting the following error

  warnings.warn(
Dual U-Net: number of ref blocks:  15
sampling loop time step:   0%|                                                                                                                                                                               | 0/1000 [00:00<?, ?it/s]
  0%|                                                                                                                                                                                                           | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 808, in <module>
    frame, lmodel, lopt = generate(**vars(args))
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 563, in generate
    out_tensor, visu = model.restoration(
                       ^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 95, in restoration
    return self.restoration_ddpm(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 149, in restoration_ddpm
    y_t = self.p_sample(
          ^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 253, in p_sample
    model_mean, model_log_variance = self.p_mean_variance(
                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 219, in p_mean_variance
    noise=self.denoise_fn(
          ^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/palette_denoise_fn.py", line 109, in forward
    out = self.model(input, embedding, ref)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1605, in forward
    h, hs, emb, h_ref, hs_ref = self.compute_feats(
                                ^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1595, in compute_feats
    h, _ = module(h, emb, qkv_ref=qkv_list.pop(0))
                                  ^^^^^^^^
UnboundLocalError: cannot access local variable 'qkv_list' where it is not associated with a value```

Answer 1 · 2023-10-18T17:42:02.000Z

Hi @concrete13377 thanks for reporting this, I can reproduce it. There's a flag and input needed. I'll come back with a fix.

Answer 2 · 2023-10-18T20:15:13.000Z

See #569

The PR allows you to generate image with reference input:

python3 gen_single_image_diffusion.py --model-in-file /path/to/model/latest_net_G_A.pth --img-in viton_bbox_ref/testA/imgs/00006_00.jpg --bbox-in viton_bbox_ref/testA/bbox/00006_00.txt --ref-in viton_bbox_ref/testA/ref/00006_00.jpg --dir-out /path/to/out/ --img-width 128 --img-height 128

You want to look at the result /path/to/out/img_0_generated_crop.png. (The img_0_generated.png image is incorrect in this case since the model from documentation is trained from 512x512 crops that contain the garment bbox, so the model never sees heads, etc...).

Answer 3 · 2023-10-18T22:51:09.000Z

thank you so much for your work

Answer 4 · 2023-10-19T21:58:13.000Z

what do you mean by "the model never sees heads, etc"? how can I train the model to get correct generated image?

Answer 5 · 2023-10-20T07:18:36.000Z

Adding --data_online_creation_load_size_A 768 1024 would load the image full size at training time.

For a typical output during training:

Images are squared, but you can resize them afterwards easily.

Answer 6 · 2023-10-20T08:05:27.000Z

so there's no way to use a model trained with example config since it's wrong about resolution? or can I just run it with other options so that it generates correct images?

Answer 7 · 2023-10-20T08:32:19.000Z

The example model lacks the full context. You can try to hack a crop at inference, but I don´t see how this would help much.

However, you can finetune your existing model with the --data_online_creation_load_size_A 768 1024 and --train_continueoptions. This would prevent from retraining from scratch.