lr_warmup should not be passed when adafactor is used as the optimizer
martianunlimited opened this issue · 3 comments
It's not really a major bug, more of an inconvenience, but the gui passes lr_warmup to the training command when it is non-zero, causing library/train_util.py to raise a ValueError if the optimizer is AdaFactor. Error is raised mid-way after latents are cached, just before dataloader is created instead of the start of the calling train_network, wasting time, and making it harder to pick out what caused the error for users who are not used to reading stack traces.
ValueError: adafactor:0.0001 does not require
num_warmup_steps
. Set None or 0.
Suggested fix in the order of preference:
a) Pop-up a warning in the gui if user did not set lr_warmup to 0 when optimizer is set to AdaFactor (recommended)
or
b) Raise error at train_network.py when invalid combination of optimizer and lr_warmup is used.
c) Change train_util.py to raise a warning and ignore value lr_warmup (not-recommended)
Unless there is differing opinions as to why a) and b) is not the way to go, I will go ahead and send a push request for a) and b) over the weekend with said "fix".
I can't change train_network.py because it is maintained by kohya in his repo. I can implement option a easilly enough ;-)
The dev branch now has the fix
Does setting "LR warmup (% of steps)" to "0" act as a workaround?