Enabling dim_from_weights or loraplus_unet_lr_ratio will cause the error: "train_blocks must be single for split mode" (content updated).
Opened this issue · 3 comments
Hi,
Today, when I was running LoRA training for the Flux.1
model (sd-scripts on SD3's breach), the "train_blocks must be single for split mode
" error suddenly occurred. This error had not appeared before. After reviewing the parameter settings, I finally found the cause.
F:\kohya_ss\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in <module>
trainer.train(args)
File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target
model_pred = call_dit(
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit
assert network.train_blocks == "single", "train_blocks must be single for split mode"
AssertionError: train_blocks must be single for split mode
The issue was that I specified both the "network_weights
" and "dim_from_weights
" parameters. Once I disabled the "dim_from_weights
" parameter, everything worked fine again.
I wonder if anyone else has encountered the same issue. Could it be that dim_from_weights
retrieves double blocks, causing the split mode mechanism to malfunction?
Today, I tested several parameter settings again and found that whenever "train_blocks": "single"
is set, adding --network_args "loraplus_unet_lr_ratio=4"
also triggers the error message: AssertionError: train_blocks must be single for split mode
.
Traceback (most recent call last):
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in <module>
trainer.train(args)
File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target
model_pred = call_dit(
File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit
assert network.train_blocks == "single", "train_blocks must be single for split mode"
AssertionError: train_blocks must be single for split mode
steps: 0%| | 0/10960 [01:14<?, ?it/s]
Traceback (most recent call last):
File "AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
sys.exit(main())
File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\\kohya_ss\\venv\\Scripts\\python.exe', 'F:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'F:/model/config_lora-20241113-000607.toml', '--network_args', 'loraplus_unet_lr_ratio=4']' returned non-zero exit status 1.
00:21:00-078513 INFO Training has ended.
--split_mode
is now deprecated, so I think this issue might be solved as a side effect. I'd like you to try again.
--split_mode
is now deprecated, so I think this issue might be solved as a side effect. I'd like you to try again.
Hi kohya-ss,
First of all, thank you for your continuous efforts in enhancing and improving sd-scripts
. A few days ago, I updated to the latest version and noticed that split_mode
has indeed been deprecated and replaced by the powerful blocks_to_swap
mechanism. This is an excellent design that allows training all blocks within limited VRAM, albeit at the cost of heavily utilizing system RAM (swap).
However, if possible, I would personally like to see the split_mode
mechanism retained. Why? Because split_mode
is a very lightweight training method that also allows models to be trained with minimal VRAM usage, and it barely uses any system RAM. The trade-off, of course, is that it can only train a subset of blocks.