About the training pair
Closed this issue · 6 comments
Thanks for your great work!
I would like to know what format the training data is in.
Hi. I'm not sure exactly what you mean by "format", but we used several open-source datasets to train our models. For image data, we used datasets including WikiArt and LAION-Aesthetic-6.5+. For video models, we used part of WebVid-10M dataset. I hope this resolves your query.
Sorry I didn't express my question clearly. I mean, is the input and output of training, like IPAdapter, trained as a reconstruction task, i.e. the input style image is the target image?
Yes, but there is a bit difference. We have augmented the training data, i.e., the input style references and gt images are actually cropped from different areas of the same image(while still sharing a large amount of areas). For more details, you can refer to our paper.
understand. Thank you for your patience.
hello, What is used for training loss in this paper? @GongyeLiu
Hello, just the common MSE loss for diffusion models.