bghira/SimpleTuner

'Trainer' object has no attribute 'optimizer'

Closed this issue · 5 comments

How to fix?
Lora flux train.

2024-09-25 16:22:17,476 [INFO] LyCORIS network has been initialized with 44,826,624 parameters
2024-09-25 16:22:17,476 [INFO] Moving the diffusion transformer to GPU in int8-quanto precision.
2024-09-25 16:22:20,823 [INFO] Benchmarking base model for comparison. Supply --disable_benchmark: true to disable this behaviour.
'Trainer' object has no attribute 'optimizer'
Traceback (most recent call last):
File "/home/gz-/repos/SimpleTuner/train.py", line 43, in
trainer.init_validations()
File "/home/gz-/repos/SimpleTuner/helpers/training/trainer.py", line 1215, in init_validations
self.init_benchmark_base_model()
File "/home/gz-/repos/SimpleTuner/helpers/training/trainer.py", line 1234, in init_benchmark_base_model
self.optimizer.eval()
^^^^^^^^^^^^^^
AttributeError: 'Trainer' object has no attribute 'optimizer'

there is not enough info here to proceed

line 1234 isn't even that anymore - chances are you are on a very old version. you should upgrade.

I'm able to reproduce this on the release branch when trying to use adamw_schedulefree. Switching to adamw_bf16 got rid of the error so it seems something's wrong with schedulefree in particular

i cannot:

2024-10-20 10:36:27,590 [INFO] cls: <class 'helpers.training.optimizers.adamw_schedulefree.AdamWScheduleFreeKahan'>, settings: {'betas': (0.9, 0.999), 'weight_decay': 0.01, 'eps': 1e-08, 'warmup_steps': 0}
2024-10-20 10:36:27,593 [INFO] Optimizer arguments={'lr': 8e-05, 'betas': (0.9, 0.999), 'weight_decay': 0.01, 'eps': 1e-08, 'warmup_steps': 0}
2024-10-20 10:36:27,596 [INFO] Using experimental AdamW ScheduleFree optimiser from Facebook. Experimental due to newly added Kahan summation.
2024-10-20 10:36:27,596 [INFO] Using dummy learning rate scheduler
2024-10-20 10:36:27,601 [INFO] Preparing models..
2024-10-20 10:36:27,601 [INFO] Loading our accelerator...
2024-10-20 10:36:27,618 [INFO] After removing any undesired samples and updating cache entries, we have settled on 3 epochs and 5 steps per epoch.
2024-10-20 10:36:27,619 [INFO] Checkpoint 'latest' does not exist. Starting a new training run.

ah, it requires having no benchmark created to reproduce the error