jolibrain/joliGEN

unet ref training preview and inference script

concrete13377 opened this issue · 4 comments

after 500 epochs results of training preview do not match results of generated image with inference script
download 525
image

I tried finetuning previous checkpoints trained without data_online_creation_load_size_A option python3 train.py \ --dataroot /datasets/viton_ref/viton_bbox_ref \ --checkpoints_dir /checkpoints \ --name viton_bbox_ref \ --config_json examples/example_ddpm_unetref_viton.json \ --data_online_creation_load_size_A 768 1024 \ --train_continue \

then I tried inference python3 scripts/gen_single_image_diffusion.py \ --model-in-file /checkpoints/viton_bbox_ref/latest_net_G_A.pth \ --img-in /datasets/viton_ref/viton_bbox_ref/trainA/imgs/00000_00.jpg \ --bbox-in /datasets/viton_ref/viton_bbox_ref/trainA/bbox/00000_00.txt \ --ref-in /datasets/viton_ref/viton_bbox_ref/trainA/ref/00000_00.jpg \ --dir-out /checkpoints/viton_bbox_ref/inference_output \ --img-width 128 \ --img-height 128

I also tried inference with 96 128 (did not help improove results) and 512 512 (that approximatelly taking 6 hours)

royale commented

Hello, please could you try with the PR #577 and the following command:

python3 gen_single_image_diffusion.py --model-in-file /checkpoints/viton_bbox_ref/latest_net_G_A.pth --img-in /datasets/viton_ref/viton_bbox_ref/testA/imgs/00013_00.jpg --bbox-in /datasets/viton_ref/viton_bbox_ref/testA/bbox/00013_00.txt --ref-in /datasets/viton_ref/viton_bbox_ref/testA/ref/00017_00.jpg --dir-out /checkpoints/viton_bbox_ref/inference_output

it's similiar, it seems that the code that generates samples preview during training and this separate inference script have some drastic differences
image

beniz commented

This was closed by mistake (automatically from PR).
@concrete13377 can you try with the same image as @royale for a start ?
Some images have bboxes that are larger than the crop, these are corner cases.

This is what we get (from our internal chat):
img_0_generated

The fuzzyness is from the resolution upsampling.

Note: models can be finetuned in higher dimensions easily as they are fully convolutional.

sure, I haven't notice that the suggested command have different image as an argument, with this one it works fine, thank you for your work!