Enabling dim_from_weights or loraplus_unet_lr_ratio will cause the error: "train_blocks must be single for split mode" (content updated).

Question

Enabling dim_from_weights or loraplus_unet_lr_ratio will cause the error: "train_blocks must be single for split mode" (content updated).

Opened this issue a month ago · 3 comments

Hi,

Today, when I was running LoRA training for the Flux.1 model (sd-scripts on SD3's breach), the "train_blocks must be single for split mode" error suddenly occurred. This error had not appeared before. After reviewing the parameter settings, I finally found the cause.

F:\kohya_ss\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in <module>
    trainer.train(args)
  File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train
    noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target
    model_pred = call_dit(
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit
    assert network.train_blocks == "single", "train_blocks must be single for split mode"
AssertionError: train_blocks must be single for split mode

The issue was that I specified both the "network_weights" and "dim_from_weights" parameters. Once I disabled the "dim_from_weights" parameter, everything worked fine again.

I wonder if anyone else has encountered the same issue. Could it be that dim_from_weights retrieves double blocks, causing the split mode mechanism to malfunction?

Answer 1 · 2024-11-12T17:05:02.000Z

Today, I tested several parameter settings again and found that whenever "train_blocks": "single" is set, adding --network_args "loraplus_unet_lr_ratio=4" also triggers the error message: AssertionError: train_blocks must be single for split mode.

Traceback (most recent call last):
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 564, in <module>
    trainer.train(args)
  File "F:\kohya_ss\sd-scripts\train_network.py", line 1177, in train
    noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 427, in get_noise_pred_and_target
    model_pred = call_dit(
  File "F:\kohya_ss\sd-scripts\flux_train_network.py", line 393, in call_dit
    assert network.train_blocks == "single", "train_blocks must be single for split mode"
AssertionError: train_blocks must be single for split mode
steps:   0%|                                                                                                                                                       | 0/10960 [01:14<?, ?it/s]
Traceback (most recent call last):
  File "AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "F:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
    sys.exit(main())
  File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "F:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\\kohya_ss\\venv\\Scripts\\python.exe', 'F:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'F:/model/config_lora-20241113-000607.toml', '--network_args', 'loraplus_unet_lr_ratio=4']' returned non-zero exit status 1.
00:21:00-078513 INFO     Training has ended.

Answer 2 · 2024-11-21T12:49:09.000Z

--split_mode is now deprecated, so I think this issue might be solved as a side effect. I'd like you to try again.

Answer 3 · 2024-11-21T15:57:22.000Z

--split_mode is now deprecated, so I think this issue might be solved as a side effect. I'd like you to try again.

Hi kohya-ss,

First of all, thank you for your continuous efforts in enhancing and improving sd-scripts. A few days ago, I updated to the latest version and noticed that split_mode has indeed been deprecated and replaced by the powerful blocks_to_swap mechanism. This is an excellent design that allows training all blocks within limited VRAM, albeit at the cost of heavily utilizing system RAM (swap).

However, if possible, I would personally like to see the split_mode mechanism retained. Why? Because split_mode is a very lightweight training method that also allows models to be trained with minimal VRAM usage, and it barely uses any system RAM. The trade-off, of course, is that it can only train a subset of blocks.