Doesn't converge when I train with my own data

Question

Doesn't converge when I train with my own data

Opened this issue 5 months ago · 10 comments

zf-666 commented 5 months ago

Loss has been shaking.， Wish I could see a picture of the correct loss，

Answer 1 · 2024-08-16T03:40:53.000Z

Me too. It confuses me a lot. Have you solved it? @zf-666 Or could the authors help us please? @juxuan27

Answer 2 · 2024-08-19T13:24:48.000Z

How did you build your dataset?

Answer 3 · 2024-08-20T07:19:00.000Z

hi @juxuan27 @yuanhangio, thanks for such great work!
Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used?
I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch).

I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance?

similar issue seem in (#35), but didn't find some explanation on the loss

Answer 4 · 2024-08-20T13:46:46.000Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

Answer 5 · 2024-08-21T02:39:55.000Z

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch). I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance? similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

Answer 6 · 2024-08-26T06:18:11.000Z

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch). I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance? similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

1024x1024 for SDXL

Answer 7 · 2024-08-26T06:19:12.000Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

Answer 8 · 2024-09-07T08:37:48.000Z

@yuanhangio @juxuan27 there are only about 5-10 images in own data set, can brushnet converge?? how many images at leat should I prepare??

Answer 9 · 2024-09-10T02:45:05.000Z

I use BrushData as my dataset, but in some data missing "width" in .tar file, so the training process is failed. Is anyone know how to fix this in train_brushnet.py to skip this sample and continue training?
Thanks so much!

Answer 10 · 2024-09-26T01:44:35.000Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

Hi, did you solve it? Could you please share some results images? Thanks!