GongyeLiu/StyleCrafter

About the training pair

Closed this issue · 6 comments

Thanks for your great work!
I would like to know what format the training data is in.

Hi. I'm not sure exactly what you mean by "format", but we used several open-source datasets to train our models. For image data, we used datasets including WikiArt and LAION-Aesthetic-6.5+. For video models, we used part of WebVid-10M dataset. I hope this resolves your query.

Sorry I didn't express my question clearly. I mean, is the input and output of training, like IPAdapter, trained as a reconstruction task, i.e. the input style image is the target image?

Yes, but there is a bit difference. We have augmented the training data, i.e., the input style references and gt images are actually cropped from different areas of the same image(while still sharing a large amount of areas). For more details, you can refer to our paper.

understand. Thank you for your patience.

hello, What is used for training loss in this paper? @GongyeLiu

Hello, just the common MSE loss for diffusion models.