Slow convergence on training on toy dataset that you provided

Question

Slow convergence on training on toy dataset that you provided

iLori-Jiang opened this issue a year ago · 4 comments

First of all, thanks for your amazing work! @lllyasviel

However, I met the problem that as I followed your steps to retrain the model to fill the circle with your toy dataset, the model converges really slow.

Below is the sampling result of 3900 steps with batch size of 4 (all the parameters remain unchanged as your tutorial_train.py).

G.T. [reconstruction_gs-003900_e-000000_b-003900]:

Output [samples_cfg_scale_9 00_gs-003900_e-000000_b-003900]:

The model seems to be able to understand the color, but cannot understand the position of the circle.

\

While continuing the training, the sampling result of 11875 steps:

G.T. [reconstruction_gs-011875_e-000001_b-003000]:

Output [samples_cfg_scale_9 00_gs-011875_e-000001_b-003000]:

The model finally learns the position of the circle, but seems no longer understand the color anymore.

Do you have any insights on this problem, or do you have any instructions on helping me solve this? Thank you in advance!

Answer 1 · 2024-09-14T09:56:56.000Z

I'm also facing this problem. Did further training improve anything? or did you have to change the code somehow?🤔

Answer 2 · 2024-11-04T00:56:21.000Z

@iLori-Jiang

wait until 25000 steps

i've tested this dataset and 5w other dataset and 3w dataset.
they all work after > 18000

this use same circle dataset, just run the script and wait
https://github.com/huggingface/diffusers/tree/main/examples/controlnet

Answer 3 · 2025-02-02T10:06:51.000Z

@crapthings Then

why is the convergence point of around 6000-10000 steps mentioned everywhere?

Answer 4 · 2025-02-05T04:01:38.000Z

@crapthings Then

why is the convergence point of around 6000-10000 steps mentioned everywhere?

maybe the batch size is 8
i use huggingface trainning script, 4 batch size