About the training pair

Question

About the training pair

Closed this issue 4 months ago · 6 comments

Thanks for your great work!
I would like to know what format the training data is in.

Answer 1 · 2024-04-22T07:30:16.000Z

Hi. I'm not sure exactly what you mean by "format", but we used several open-source datasets to train our models. For image data, we used datasets including WikiArt and LAION-Aesthetic-6.5+. For video models, we used part of WebVid-10M dataset. I hope this resolves your query.

Answer 2 · 2024-04-22T07:36:42.000Z

Sorry I didn't express my question clearly. I mean, is the input and output of training, like IPAdapter, trained as a reconstruction task, i.e. the input style image is the target image?

Answer 3 · 2024-04-22T07:53:39.000Z

Yes, but there is a bit difference. We have augmented the training data, i.e., the input style references and gt images are actually cropped from different areas of the same image(while still sharing a large amount of areas). For more details, you can refer to our paper.

Answer 4 · 2024-04-22T08:28:36.000Z

understand. Thank you for your patience.

Answer 5 · 2024-08-23T07:50:24.000Z

hello, What is used for training loss in this paper? @GongyeLiu

Answer 6 · 2024-08-24T03:48:44.000Z

Hello, just the common MSE loss for diffusion models.