lr_warmup should not be passed when adafactor is used as the optimizer

Question

lr_warmup should not be passed when adafactor is used as the optimizer

martianunlimited opened this issue 2 years ago · 3 comments

It's not really a major bug, more of an inconvenience, but the gui passes lr_warmup to the training command when it is non-zero, causing library/train_util.py to raise a ValueError if the optimizer is AdaFactor. Error is raised mid-way after latents are cached, just before dataloader is created instead of the start of the calling train_network, wasting time, and making it harder to pick out what caused the error for users who are not used to reading stack traces.

ValueError: adafactor:0.0001 does not require num_warmup_steps. Set None or 0.

Suggested fix in the order of preference:
a) Pop-up a warning in the gui if user did not set lr_warmup to 0 when optimizer is set to AdaFactor (recommended)
or
b) Raise error at train_network.py when invalid combination of optimizer and lr_warmup is used.
c) Change train_util.py to raise a warning and ignore value lr_warmup (not-recommended)

Unless there is differing opinions as to why a) and b) is not the way to go, I will go ahead and send a push request for a) and b) over the weekend with said "fix".

Answer 1 · 2023-04-13T23:08:20.000Z

I can't change train_network.py because it is maintained by kohya in his repo. I can implement option a easilly enough ;-)

Answer 2 · 2023-04-13T23:43:00.000Z

The dev branch now has the fix

Answer 3 · 2023-12-20T19:58:49.000Z

Does setting "LR warmup (% of steps)" to "0" act as a workaround?