ChenWu98/cycle-diffusion

Can I use the unpaired images of different sizes for image to image translate

pseudo-usama opened this issue · 12 comments

I am interested in implementing super resolution using cycle diffusion on images.

My low-resolution images are 64x64, while the high-resolution ones are 512x512.

Although one solution could be to resize the low-resolution images to match the high-resolution ones but doing so will increase the model parameters for the low-resolution images.

Therefore, I am considering using the original image size and wondering if that is possible.

Thanks!

Sorry for the late reply! Our method now requires the input and output to have the same dimension since the noise vectors should have the same dimension in order to transfer :(
Applying CycleDiffusion to super-resolution is interesting, and I'm quite curious if it can work to some extent.

I closed this issue. Feel free to re-open it if there are any further questions.

Sorry for the late reply! Our method now requires the input and output to have the same dimension since the noise vectors should have the same dimension in order to transfer :( Applying CycleDiffusion to super-resolution is interesting, and I'm quite curious if it can work to some extent.

May I resize the images to the same resolution to meet the requirement?

yeah I think that should work.

yeah I think that should work.

Thank you for your response. I would like to inquire about using your method to convert RGB images to Raw-RGB images. Do I need to train two separate DDPM or LDM models for this? However, I have encountered some issues and would appreciate your assistance:

  1. If I need to train two models, what should I do after training these two models?
  2. I ran the script provided by you:
    nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1405 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
    I found that the program went directly into the backend and has been running for over a day. I haven't seen any output results in the log and output folders. Is this normal? How can I get the results of running this script?

Looking forward to your response.

Sorry about the late reply - I thought this issue was closed. Can you try running it without nohup? I think it should work if you specify the paths to your models (and if they are trained with openai/guided-diffusion)

Sorry about the late reply - I thought this issue was closed. Can you try running it without nohup? I think it should work if you specify the paths to your models (and if they are trained with openai/guided-diffusion)

Thank you for your response. After training on a 3090 for over a day, I finally saw the output results in the output directory, and they are consistent with what's described in your paper. However, I have a few questions:

  1. Is this training duration normal? Is there an interface in the codebase that could help me see some intermediate results?
  2. I apologize if I still don't quite understand some parts after reading your paper. Now that I've trained a DDPM that can generate RAW-RGB images from input RGB images, do I need to train another DDPM that converts RAW-RGB images back to RGB images and run them like Cycle-GAN?
  3. When you mentioned "it should work if you specify the paths to your models (and if they are trained with openai/guided-diffusion)", could you explain in more detail how to do this, if possible?

Thank you again for your kind help!

I didn't add intermediate visualization, but I think you can do it by modifying the saving logic (i.e., saving each image once it's generated).

Sorry, I didn't understand the setting you want - if you already have a model that can generate Raw-RGB images from RGB images, what would you want our method to do in this case?

I didn't add intermediate visualization, but I think you can do it by modifying the saving logic (i.e., saving each image once it's generated).

Sorry, I didn't understand the setting you want - if you already have a model that can generate Raw-RGB images from RGB images, what would you want our method to do in this case?

Thank you for your suggestion about adding intermediate visualization. I appreciate the insight.

Regarding the capabilities of our current model, it is designed to generate Raw-RGB images from RGB images by using paired RGB images as a condition. Our goal is to work on an unpaired image-to-image translation task where the transformation can be learned without relying on paired data. In other words, we aim to provide the model with unpaired reference Raw-RGB images so that it understands how to apply the corresponding transformations to input RGB images, even in the absence of directly corresponding RGB-Raw pairs.

I've noticed that your paper presents a method for zero-shot image-to-image translation, which is quite impressive. Could you please advise on how I might adapt our model and follow your approach to achieve this unpaired image translation task? Any guidance or suggestions on the modifications required would be greatly appreciated.

Do RGB and Raw-RGB have the same dimensionality (e.g., both are 512x512x3)? If yes, then you can view a diffusion model trained on Raw-RGB images as the "cat diffusion model" and another diffusion model trained on RGB images as the "dog diffusion model", and use our "cat-dog translation pipeline" shown in the README.

modifying the saving logic

Thank you very much for the clarification. However, I have a few questions and would greatly appreciate your insights:

  1. Is the dimensionality of 512x512x3 fixed? Our Raw-RGB data might have more channels, possibly x6 or x9. In this case, can the two models still interact properly? Do you have any recommended settings that could be adjusted to address the issue of mismatched channel dimensions?
  2. Regarding the "cat diffusion model" and "dog diffusion model" you mentioned, are these DDPM models conditioned on cat and dog text prompts, respectively, or are they DDPM generative models trained separately on cat and dog datasets?
  3. I would also like to inquire about the "modifying the saving logic" part you previously mentioned. Could you point me to the specific part of the code where this modification should be made?

Thank you for your patience and assistance!

  1. No, our method is built upon the observation that if two datasets have structurally similar images with the same dimension, then they are pretty aligned in the noise space.
  2. They are trained on cats and dogs, but in theory diffusion models text conditions should also work.
  3. Maybe check this line?