(sd3-Flux) returned non-zero exit status 3221225477. 13:02:14-584959 INFO Training has ended

Question

(sd3-Flux) returned non-zero exit status 3221225477. 13:02:14-584959 INFO Training has ended

Closed this issue a month ago · 5 comments

I've been struggling with this error for the third day(

Switched to the Flux branch git checkout sd3-flux.
Installed it using the setup.bat command
Configured Dreambooth Kohya_SS
Click start training, and I get this error

returned non-zero exit status 3221225477.
13:02:14-584959 INFO Training has ended.

my configuration
AMD Ryzen 3600, 32Gb RAM, 3090 24Gb
Video drivers GeForce Game Ready Driver - WHQL Driver version: 566.03

Windows 10 21H2 Build 19044.4046

cuda_12.4.0_551.61_windows

first installed this cudnn_9.4.0_windows

then completely removed everything and reinstalled

cuda_12.4.0_551.61_windows and cudnn-windows-x86_64-8.9.6.50_cuda12-archive

Visual Studio 2015, 2017, 2019, and 2022 redistributable.

pressed setup.bat tried to install item 2 in the menu, as well as 3, and 4

didn't help solve the problem

Starting the GUI... this might take some time...
12:58:39-024861 INFO Kohya_ss GUI version: v24.2.0

12:58:39-388251 INFO Submodule initialized and updated.
12:58:39-391256 INFO nVidia toolkit detected
12:58:40-888785 INFO Torch 2.5.0+cu124
12:58:40-938660 INFO Torch backend: nVidia CUDA 12.4 cuDNN 90100
12:58:40-941662 INFO Torch detected GPU: NVIDIA GeForce RTX 3090 VRAM 24575MB Arch 8.6 Cores 82
12:58:40-942664 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit
(AMD64)]
12:58:40-944665 INFO Installing/Validating requirements from requirements_pytorch_windows.txt...
12:58:41-370688 INFO Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124
12:58:41-371689 INFO Obtaining file:///F:/SD/kohya_ss/sd-scripts (from -r F:\SD\kohya_ss\requirements.txt (line 37))
12:58:41-372690 INFO Preparing metadata (setup.py): started
12:58:41-764995 INFO Preparing metadata (setup.py): finished with status 'done'
12:58:42-588197 INFO Installing collected packages: library
12:58:42-589214 INFO Attempting uninstall: library
12:58:42-590212 INFO Found existing installation: library 0.0.0
12:58:42-591212 INFO Uninstalling library-0.0.0:
12:58:43-574522 INFO Successfully uninstalled library-0.0.0
12:58:43-575524 INFO Running setup.py develop for library
12:58:44-269575 INFO Successfully installed library
12:58:44-659203 INFO headless: False
12:58:44-663711 INFO Using shell=True when running external commands...

Running on local URL: http://127.0.0.1:7860

To create a 12:59:20-330894 INFO 13:00:40-209315 INFO 13:00:40-228352 INFO 13:00:40-230353 INFO 13:00:47-739017 INFO 13:00:47-740018 INFO 13:00:47-742021 INFO 13:00:47-743021 INFO 13:00:47-745023 INFO 13:00:47-746024 INFO 13:00:47-747025 INFO 13:00:47-748027 INFO 13:00:47-749027 INFO 13:00:47-750028 INFO 13:00:47-751029 INFO 13:00:47-754032 INFO 13:00:47-755033 INFO 13:00:47-756033 INFO 13:00:47-757034 INFO Epoch: 200
13:00:47-758035 INFO 13:00:47-760037 INFO 13:00:47-762039 INFO 13:00:47-764041 INFO --dynamo_mode --num_cpu_threads_per_process F:/SD/MyImages/testP F:\SD\kohya_ss\venv\ torch.utils._pytree. F:\SD\kohya_ss\venv\ torch.utils._pytree. 2024-10-29 13:00:57 INFO F:/SD/MyImages/testP INFO F:/SD/MyIma 2024-10-29 13:00:57 INFO INFO prepare images. INFO get 100%|█████ INFO set INFO found image files
read caption: INFO 8 INFO 0 reg images. WARNING no INFO [Dataset 0] batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True public link, set share=True in launch().
Loading config...
Copy F:/SD/MyImages/xyzpinkdress to F:/SD/MyImages/testPink\img/1_xyzpink dress...
Regularization images directory is missing... not copying regularisation images...
Done creating kohya_ss training folder structure at F:/SD/MyImages/testPink...
Start training Dreambooth...
Validating lr scheduler arguments...
Validating optimizer arguments...
Validating F:/SD/MyImages/testPink\log existence and writability... SUCCESS
Validating F:/SD/MyImages/testPink\model existence and writability... SUCCESS
Validating F:/x/flux1-dev.safetensors existence... SUCCESS
Validating F:/SD/MyImages/testPink\img existence... SUCCESS
Folder 1_xyzpink dress: 1 repeats found
Folder 1_xyzpink dress: 8 images found
Folder 1_xyzpink dress: 8 * 1 = 8 steps
Regularization factor: 1
Total steps: 8
Train batch size: 1
Gradient accumulation steps: 1
max_train_steps (8 / 1 / 1 * 200 * 1) = 1600
lr_warmup_steps = 0
Saving training config to F:/SD/MyImages/testPink\model\last3_20241029-130047.json...
Executing command: F:\SD\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
default --mixed_precision fp16 --num_processes 1 --num_machines 1
2 F:/SD/kohya_ss/sd-scripts/flux_train.py --config_file
ink\model/config_dreambooth-20241029-130047.toml
lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(
lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(
Loading settings from train_util.py:4435
ink\model/config_dreambooth-20241029-130047.toml...
ges/testPink\model/config_dreambooth-20241029-130047 train_util.py:4454
Using DreamBooth method. flux_train.py:107
train_util.py:1956
image size from name of cache files train_util.py:1873
███████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
image size from cache files: 0/8 train_util.py:1901
directory F:\SD\MyImages\testPink\img\1_xyzpink dress contains 8 train_util.py:1903
100%|██████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
train images with repeating. train_util.py:1997
train_util.py:2000
regularization images / 正則化画像が見つかりませんでした train_util.py:2005
config_util.py:567

                           [Subset 0 of Dataset 0]
                             image_dir: "F:\SD\MyImages\testPink\img\1_xyzpink dress"
                             image_count: 8
                             num_repeats: 1
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_separator: ,
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1
                             token_warmup_step: 0
                             alpha_mask: False
                             custom_attributes: {}
                             is_reg: False
                             class_tokens: xyzpink dress
                             caption_extension: .txt


                INFO     [Dataset 0]                                                              config_util.py:573
                INFO     loading image sizes.                                                      train_util.py:923

100%|██████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 7987.25it/s]
INFO make buckets train_util.py:946
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:963
set, because bucket reso is defined by image size automatically /
bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
算されるため、min_bucket_resoとmax_bucket_resoは無視されます
INFO number of images (including repeats) / train_util.py:992
各bucketの画像枚数（繰り返し回数を含む）
INFO bucket 0: resolution (896, 1088), count: 6 train_util.py:997
INFO bucket 1: resolution (1024, 1024), count: 2 train_util.py:997
INFO mean ar error (without repeats): 0.01759017994531699 train_util.py:1002
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:62
INFO prepare accelerator flux_train.py:177
accelerator device: cpu
INFO Building AutoEncoder flux_utils.py:152
INFO Loading state dict from F:/x/ae.safetensors flux_utils.py:157
INFO Loaded AE: flux_utils.py:160
INFO [Dataset 0] train_util.py:2480
INFO caching latents with caching strategy. train_util.py:1048
INFO caching latents... train_util.py:1093
0%| | 0/8 [00:00<?, ?it/s]Traceback (most recent call last):
File "F:\Programs\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\Programs\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\SD\kohya_ss\venv\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\SD\kohya_ss\venv\Scripts\python.exe', 'F:/SD/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'F:/SD/MyImages/testPink\model/config_dreambooth-20241029-130047.toml']' returned non-zero exit status 3221225477.
13:02:14-584959 INFO Training has ended.

Answer 1 · 2024-10-29T12:01:33.000Z

It seems that the stack trace of the direct cause is not displayed. However, the following line appears in the log, so it seems that the accelerate config is set not to use the GPU. Please run accelerate config again.

accelerator device: cpu

Answer 2 · 2024-10-29T12:26:35.000Z

I launched the accelerate config in setup.bat, configured it this way. But unfortunately it didn't help

(Optional) Manually configure Accelerate:

This machine

No distributed training  

Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)?

No

Do you wish to optimize your script with torch dynamo?

No

Do you want to use DeepSpeed?

No

What GPU(s) (by id) should be used for training on this machine as a comma-seperated list?

All

Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware).

Yes

Do you wish to use FP16 or BF16 (mixed precision)?

bf16

my default_config.yaml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
enable_cpu_affinity: true
gpu_ids: All
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Starting the GUI... this might take some time...
15:18:57-118870 INFO Kohya_ss GUI version: v24.2.0

15:18:57-454009 INFO Submodule initialized and updated.
15:18:57-458013 INFO nVidia toolkit detected
15:18:58-917855 INFO Torch 2.5.0+cu124
15:18:58-947951 INFO Torch backend: nVidia CUDA 12.4 cuDNN 90100
15:18:58-950953 INFO Torch detected GPU: NVIDIA GeForce RTX 3090 VRAM 24575MB Arch 8.6 Cores 82
15:18:58-952956 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit
(AMD64)]
15:18:58-953957 INFO Installing/Validating requirements from requirements_pytorch_windows.txt...
15:18:59-411372 INFO Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124
15:18:59-412373 INFO Obtaining file:///F:/SD/kohya_ss/sd-scripts (from -r F:\SD\kohya_ss\requirements.txt (line 37))
15:18:59-413374 INFO Preparing metadata (setup.py): started
15:18:59-817729 INFO Preparing metadata (setup.py): finished with status 'done'
15:19:00-631468 INFO Installing collected packages: library
15:19:00-632468 INFO Attempting uninstall: library
15:19:00-633469 INFO Found existing installation: library 0.0.0
15:19:00-634470 INFO Uninstalling library-0.0.0:
15:19:01-701512 INFO Successfully uninstalled library-0.0.0
15:19:01-702513 INFO Running setup.py develop for library
15:19:02-393283 INFO Successfully installed library
15:19:02-776088 INFO headless: False
15:19:02-780092 INFO Using shell=True when running external commands...

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
15:19:39-704822 INFO Loading config...
15:20:17-848041 INFO Copy F:/SD/MyImages/xyzpinkdress to F:/SD/MyImages/testPink\img/40_xyzpink dress...
15:20:17-868062 INFO Regularization images directory is missing... not copying regularisation images...
15:20:17-870064 INFO Done creating kohya_ss training folder structure at F:/SD/MyImages/testPink...
15:20:20-914780 INFO Start training Dreambooth...
15:20:20-915781 INFO Validating lr scheduler arguments...
15:20:20-917783 INFO Validating optimizer arguments...
15:20:20-918784 INFO Validating F:/SD/MyImages/testPink\log existence and writability... SUCCESS
15:20:20-919785 INFO Validating F:/SD/MyImages/testPink\model existence and writability... SUCCESS
15:20:20-920786 INFO Validating F:/x/flux1-dev.safetensors existence... SUCCESS
15:20:20-921787 INFO Validating F:/SD/MyImages/testPink\img existence... SUCCESS
15:20:20-923789 INFO Folder 40_xyzpink dress: 40 repeats found
15:20:20-923789 INFO Folder 40_xyzpink dress: 8 images found
15:20:20-925791 INFO Folder 40_xyzpink dress: 8 * 40 = 320 steps
15:20:20-926791 INFO Regularization factor: 1
15:20:20-927792 INFO Total steps: 320
15:20:20-930795 INFO Train batch size: 1
15:20:20-931796 INFO Gradient accumulation steps: 1
15:20:20-931796 INFO Epoch: 200
15:20:20-932797 INFO max_train_steps (320 / 1 / 1 * 200 * 1) = 64000
15:20:20-934799 INFO lr_warmup_steps = 0
15:20:20-944901 INFO Saving training config to F:/SD/MyImages/testPink\model\last3_20241029-152020.json...
15:20:20-947905 INFO Executing command: F:\SD\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
--dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1
--num_cpu_threads_per_process 2 F:/SD/kohya_ss/sd-scripts/flux_train.py --config_file
F:/SD/MyImages/testPink\model/config_dreambooth-20241029-152020.toml
F:\SD\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
F:\SD\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
2024-10-29 15:20:30 INFO Loading settings from train_util.py:4435
F:/SD/MyImages/testPink\model/config_dreambooth-20241029-152020.toml...
INFO F:/SD/MyImages/testPink\model/config_dreambooth-20241029-152020 train_util.py:4454
2024-10-29 15:20:30 INFO Using DreamBooth method. flux_train.py:107
INFO prepare images. train_util.py:1956
INFO get image size from name of cache files train_util.py:1873
100%|████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
INFO set image size from cache files: 0/8 train_util.py:1901
INFO found directory F:\SD\MyImages\testPink\img\40_xyzpink dress contains 8 train_util.py:1903
image files
read caption: 100%|████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 7985.35it/s]
INFO 320 train images with repeating. train_util.py:1997
INFO 0 reg images. train_util.py:2000
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2005
INFO [Dataset 0] config_util.py:567
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir: "F:\SD\MyImages\testPink\img\40_xyzpink dress"
                             image_count: 8
                             num_repeats: 40
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_separator: ,
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1
                             token_warmup_step: 0
                             alpha_mask: False
                             custom_attributes: {}
                             is_reg: False
                             class_tokens: xyzpink dress
                             caption_extension: .txt


                INFO     [Dataset 0]                                                              config_util.py:573
                INFO     loading image sizes.                                                      train_util.py:923

100%|████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
INFO make buckets train_util.py:946
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:963
set, because bucket reso is defined by image size automatically /
bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
算されるため、min_bucket_resoとmax_bucket_resoは無視されます
INFO number of images (including repeats) / train_util.py:992
各bucketの画像枚数（繰り返し回数を含む）
INFO bucket 0: resolution (896, 1088), count: 240 train_util.py:997
INFO bucket 1: resolution (1024, 1024), count: 80 train_util.py:997
INFO mean ar error (without repeats): 0.01759017994531699 train_util.py:1002
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:62
INFO prepare accelerator flux_train.py:177
accelerator device: cpu
INFO Building AutoEncoder flux_utils.py:152
INFO Loading state dict from F:/x/ae.safetensors flux_utils.py:157
INFO Loaded AE: flux_utils.py:160
INFO [Dataset 0] train_util.py:2480
INFO caching latents with caching strategy. train_util.py:1048
INFO caching latents... train_util.py:1093
0%| | 0/8 [00:00<?, ?it/s]Traceback (most recent call last):
File "F:\Programs\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\Programs\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\SD\kohya_ss\venv\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "F:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\SD\kohya_ss\venv\Scripts\python.exe', 'F:/SD/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'F:/SD/MyImages/testPink\model/config_dreambooth-20241029-152020.toml']' returned non-zero exit status 3221225477.
15:21:44-762115 INFO Training has ended.

Answer 3 · 2024-10-29T16:03:35.000Z

I reinstalled the system, but it did not help to solve the problem.

The problem was solved by specifying "0" in the settings in the screenshot

But, after the start of training, a new problem appeared

18:50:59-667634 INFO Loading config...
18:51:03-238012 INFO Save...
18:51:38-689051 INFO Copy D:/SD/MyImages/xyzpinkdress to D:/SD/MyImages/testPink\img/1_xyzpink dress...
18:51:38-712581 INFO Regularization images directory is missing... not copying regularisation images...
18:51:38-713582 INFO Done creating kohya_ss training folder structure at D:/SD/MyImages/testPink...
18:52:30-526210 INFO Start training Dreambooth...
18:52:30-527211 INFO Validating lr scheduler arguments...
18:52:30-529213 INFO Validating optimizer arguments...
18:52:30-530215 INFO Validating D:/SD/MyImages/testPink\log existence and writability... SUCCESS
18:52:30-531215 INFO Validating D:/SD/MyImages/testPink\model existence and writability... SUCCESS
18:52:30-532238 INFO Validating D:/x/flux1-dev.safetensors existence... SUCCESS
18:52:30-533222 INFO Validating D:/SD/MyImages/testPink\img existence... SUCCESS
18:52:30-534245 INFO Folder 1_xyzpink dress: 1 repeats found
18:52:30-535226 INFO Folder 1_xyzpink dress: 8 images found
18:52:30-536235 INFO Folder 1_xyzpink dress: 8 * 1 = 8 steps
18:52:30-537231 INFO Regularization factor: 1
18:52:30-538232 INFO Total steps: 8
18:52:30-539232 INFO Train batch size: 1
18:52:30-542235 INFO Gradient accumulation steps: 1
18:52:30-543236 INFO Epoch: 200
18:52:30-544238 INFO max_train_steps (8 / 1 / 1 * 200 * 1) = 1600
18:52:30-545238 INFO lr_warmup_steps = 0
18:52:30-547239 INFO Saving training config to D:/SD/MyImages/testPink\model\last3_20241029-185230.json...
18:52:30-549241 INFO Executing command: D:\SD\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
--dynamo_mode default --gpu_ids 0 --mixed_precision fp16 --num_processes 1 --num_machines 1
--num_cpu_threads_per_process 2 D:/SD/kohya_ss/sd-scripts/flux_train.py --config_file
D:/SD/MyImages/testPink\model/config_dreambooth-20241029-185230.toml
D:\SD\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\SD\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
2024-10-29 18:52:41 INFO Loading settings from train_util.py:4435
D:/SD/MyImages/testPink\model/config_dreambooth-20241029-185230.toml...
INFO D:/SD/MyImages/testPink\model/config_dreambooth-20241029-185230 train_util.py:4454
2024-10-29 18:52:41 INFO Using DreamBooth method. flux_train.py:107
INFO prepare images. train_util.py:1956
INFO get image size from name of cache files train_util.py:1873
100%|████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
INFO set image size from cache files: 0/8 train_util.py:1901
INFO found directory D:\SD\MyImages\testPink\img\1_xyzpink dress contains 8 train_util.py:1903
image files
read caption: 100%|██████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
INFO 8 train images with repeating. train_util.py:1997
INFO 0 reg images. train_util.py:2000
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2005
INFO [Dataset 0] config_util.py:567
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir: "D:\SD\MyImages\testPink\img\1_xyzpink dress"
                             image_count: 8
                             num_repeats: 1
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_separator: ,
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1
                             token_warmup_step: 0
                             alpha_mask: False
                             custom_attributes: {}
                             is_reg: False
                             class_tokens: xyzpink dress
                             caption_extension: .txt


                INFO     [Dataset 0]                                                              config_util.py:573
                INFO     loading image sizes.                                                      train_util.py:923

100%|██████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 7925.00it/s]
INFO make buckets train_util.py:946
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:963
set, because bucket reso is defined by image size automatically /
bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
算されるため、min_bucket_resoとmax_bucket_resoは無視されます
INFO number of images (including repeats) / train_util.py:992
各bucketの画像枚数（繰り返し回数を含む）
INFO bucket 0: resolution (896, 1088), count: 6 train_util.py:997
INFO bucket 1: resolution (1024, 1024), count: 2 train_util.py:997
INFO mean ar error (without repeats): 0.01759017994531699 train_util.py:1002
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:62
INFO prepare accelerator flux_train.py:177
D:\SD\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:488: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
self.scaler = torch.cuda.amp.GradScaler(**kwargs)
accelerator device: cuda
INFO Building AutoEncoder flux_utils.py:152
INFO Loading state dict from D:/x/ae.safetensors flux_utils.py:157
INFO Loaded AE: flux_utils.py:160
2024-10-29 18:52:42 INFO [Dataset 0] train_util.py:2480
INFO caching latents with caching strategy. train_util.py:1048
INFO caching latents... train_util.py:1093
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.23it/s]
D:\SD\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
2024-10-29 18:52:47 INFO Building CLIP flux_utils.py:165
INFO Loading state dict from D:/x/clip_l.safetensors flux_utils.py:258
INFO Loaded CLIP: flux_utils.py:261
INFO Loading state dict from D:/x/t5xxl_fp16.safetensors flux_utils.py:306
INFO Loaded T5xxl: flux_utils.py:309
2024-10-29 18:52:56 INFO [Dataset 0] train_util.py:2502
INFO caching Text Encoder outputs with caching strategy. train_util.py:1227
INFO checking cache validity... train_util.py:1238
100%|██████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 7994.86it/s]
INFO caching Text Encoder outputs... train_util.py:1269
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.35it/s]
2024-10-29 18:52:58 INFO cache Text Encoder outputs for sample prompt: flux_train.py:240
D:/SD/MyImages/testPink\model\sample/prompt.txt
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:62
INFO Building Flux model dev from BFL checkpoint flux_utils.py:116
2024-10-29 18:52:59 INFO Loading state dict from D:/x/flux1-dev.safetensors flux_utils.py:133
INFO Loaded Flux: flux_utils.py:145
number of trainable parameters: 11901408320
prepare optimizer, data loader etc.
INFO use Adafactor optimizer | {'relative_step': True} train_util.py:4748
INFO relative_step is true / relative_stepがtrueです train_util.py:4751
WARNING learning rate is used as initial_lr / 指定したlearning train_util.py:4753
rateはinitial_lrとして使用されます
WARNING unet_lr and text_encoder_lr are ignored / train_util.py:4765
unet_lrとtext_encoder_lrは無視されます
INFO use adafactor_scheduler / スケジューラにadafactor_schedulerを使用します train_util.py:4770
running training / 学習開始
num examples / サンプル数: 8
num batches per epoch / 1epochのバッチ数: 8
num epochs / epoch数: 200
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1600
steps: 0%| | 0/1600 [00:00<?, ?it/s]
epoch 1/200
2024-10-29 18:53:46 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:715
Traceback (most recent call last):
File "D:\SD\kohya_ss\sd-scripts\flux_train.py", line 998, in
train(args)
File "D:\SD\kohya_ss\sd-scripts\flux_train.py", line 787, in train
model_pred = flux(
File "D:\SD\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SD\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SD\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 819, in forward
return model_forward(*args, **kwargs)
File "D:\SD\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 807, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "D:\SD\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "D:\SD\kohya_ss\sd-scripts\library\flux_models.py", line 1042, in forward
img, txt = block(img=img, txt=txt, vec=vec, pe=pe, txt_attention_mask=txt_attention_mask)
File "D:\SD\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SD\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SD\kohya_ss\sd-scripts\library\flux_models.py", line 756, in forward
return self._forward(img, txt, vec, pe, txt_attention_mask)
File "D:\SD\kohya_ss\sd-scripts\library\flux_models.py", line 723, in _forward
attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
File "D:\SD\kohya_ss\sd-scripts\library\flux_models.py", line 449, in attention
x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 486.00 MiB. GPU 0 has a total capacity of 24.00 GiB of which 0 bytes is free. Of the allocated memory 37.98 GiB is allocated by PyTorch, and 71.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
steps: 0%| | 0/1600 [00:35<?, ?it/s]
Traceback (most recent call last):
File "D:\Programs\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Programs\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\SD\kohya_ss\venv\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File "D:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "D:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "D:\SD\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\SD\kohya_ss\venv\Scripts\python.exe', 'D:/SD/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:/SD/MyImages/testPink\model/config_dreambooth-20241029-185230.toml']' returned non-zero exit status 1.
18:54:26-610807 INFO Training has ended.

Answer 4 · 2024-10-30T04:00:54.000Z

It looks like an out of memory error is occurring. Maybe there is no fp8_base or fused_backward_pass option or the value of blocks_to_swap option is too small. Please specify fp8_base and fused_backward_pass, and a value of around 10 for blocks_to_swap.

Answer 5 · 2024-11-05T03:50:34.000Z

The problem was solved by specifying "0" in the settings in the screenshot

It worked for me, thanks