Flux AdamWScheduleFree on 24GB
Closed this issue · 4 comments
I am trying to use AdamWScheduleFree on a flux finetune. I've loved the results I get with LORA training using this optimizer, but I haven't found a set of settings for my 4090 that don't result in OOM errors.
It appears this is because fused_backward_pass
is not supported, and I can't think of any other settings to reduce memory usage further. My dataset consists of 225 images that are close to 1024x1024 (similar total pixel area to a 1024x1024 square). My training resolution is 1024.
Do I need to run this in a cloud service in order to get more available VRAM? I'm already running on linux headless, so there's nothing using VRAM except the training script.
[common]
learning_rate = 1e-5
output_name = "psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword"
log_tracker_name = "psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword"
sample_dir = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword/samples"
sample_every_n_epochs = 5
save_every_n_epochs = 5
max_train_epochs = 100
save_state_on_train_end = true
caption_prefix = "psyart. "
max_bucket_reso = 1360
[model]
pretrained_model_name_or_path = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/nyanko7_flux-dev-de-distill.safetensors"
clip_l = "/home/rt/ai/models/stable-diffusion/clip/sd3/clip_l.safetensors"
t5xxl = "/home/rt/ai/models/stable-diffusion/clip/sd3/t5xxl_fp16.safetensors"
ae = "/home/rt/ai/models/stable-diffusion/vae/flux/ae.sft"
save_model_as = "safetensors"
output_dir = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/psyart-training-de-distilled"
[dataset]
dataset_config = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/dataset_config.toml"
persistent_data_loader_workers = true
max_data_loader_n_workers = 2
seed = 42069
[training]
full_bf16 = true
mixed_precision = "bf16"
save_precision = "bf16"
timestep_sampling = "shift"
model_prediction_type = "raw"
#max_grad_norm = 0.0
guidance_scale = 1.0
discrete_flow_shift = 3.1582
scale_weight_norms = 3
#masked_loss = true
#apply_t5_attn_mask = true # seems to cause OOM errors on 24GB?
#alpha_mask = true
[optimizer]
optimizer_type = "AdamWScheduleFree"
shuffle = true # necessary for using non-linear LR with low epochs
[output]
cpu_offload_checkpointing = true
[memory_optimization]
sdpa = true
gradient_checkpointing = true
highvram = true
cache_text_encoder_outputs_to_disk = true
cache_latents_to_disk = true
#fused_backward_pass = true
#blockwise_fused_optimizers = true
#blocks_to_swap = 36
[logging_arguments]
log_with = "wandb"
logging_dir = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/logs"
[sample_prompt_arguments]
sample_sampler = "k_dpm_2"
sample_prompts = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/sample_prompt.toml"
You may use AdamWScheduleFree with --blockwise_fused_optimizers
option instead of --fused_backward_pass
(not tested). However, I do not recommend blockwise_fused_optimizers
because it doesn't support stochastic rounding.
So, it is needed about 30GB VRAM to use AdamWScheduleFree.
I did attempt to start training with --blockwise_fused_optimizers
but it gave an error (I can't reproduce at the moment, as I have another training running currently).
Good to know that full-precision fine-tuning needs 30GB with that optimizer. I'll try running with vast.ai at some point in that case.
I should note, I'm on commit d005652
because of issues with the last few commits preventing training from starting.
When I try to use --blockwise_fused_optimizers
with AdamWScheduleFree
it simple errors saying Schedule-free optimizer is not supported with blockwise fused optimizers
. I'm setting --blocks_to_swap 8
in my config file.
Logs:
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/diffusers/utils/outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
torch.utils._pytree._register_pytree_node(
2024-10-15 12:38:30 INFO Loading settings from train_util.py:4361
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file.toml...
INFO /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file train_util.py:4380
WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4055
cache_latents_to_diskが有効なため、cache_latentsを有効にします
2024-10-15 12:38:30 WARNING cache_text_encoder_outputs_to_disk is enabled, so cache_text_encoder_outputs is also enabled / flux_train.py:64
cache_text_encoder_outputs_to_diskが有効になっているため、cache_text_encoder_outputsも有効にな
ります
INFO Load dataset config from flux_train.py:92
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/dataset_config.toml
WARNING max_bucket_reso is adjusted to be multiple of bucket_reso_steps / train_util.py:680
max_bucket_resoがbucket_reso_stepsの倍数になるように調整されました: 1360 -> 1408
INFO prepare images. train_util.py:1892
INFO get image size from name of cache files train_util.py:1830
highvram is enabled / highvramが有効です
0%| | 0/225 [00:00<?, ?it/s]
56%|█████▋ | 127/225 [00:00<00:00, 1265.47it/s]
100%|██████████| 225/225 [00:00<00:00, 1302.16it/s]
2024-10-15 12:38:31 INFO set image size from cache files: 225/225 train_util.py:1837
INFO found directory train_util.py:1839
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data contains 225
image files
read caption: 0%| | 0/225 [00:00<?, ?it/s] WARNING neither caption file nor class tokens are found. use empty caption for train_util.py:1851
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538
03468121175_6452388726452820497_o.jpg / キャプションファイルもclass
tokenも見つかりませんでした。空のキャプションを使用します:
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538
03468121175_6452388726452820497_o.jpg
WARNING neither caption file nor class tokens are found. use empty caption for train_util.py:1851
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3.
png / キャプションファイルもclass tokenも見つかりませんでした。空のキャプションを使用します:
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3.
png
read caption: 100%|██████████| 225/225 [00:00<00:00, 29182.96it/s]
WARNING No caption file found for 2 images. Training will continue without captions for these images. train_util.py:1870
If class token exists, it will be used. /
2枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなし
で学習を続行します。class tokenが存在する場合はそれを使います。
WARNING /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538 train_util.py:1877
03468121175_6452388726452820497_o.jpg
WARNING /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3. train_util.py:1877
png
INFO 225 train images with repeating. train_util.py:1933
INFO 0 reg images. train_util.py:1936
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1941
INFO [Dataset 0] config_util.py:570
batch_size: 1
resolution: (1360, 1360)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 1408
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir:
"/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data"
image_count: 225
num_repeats: 1
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_separator: ,
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: psyart.
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
alpha_mask: False,
is_reg: False
class_tokens: None
caption_extension: .caption
INFO [Dataset 0] config_util.py:576
INFO loading image sizes. train_util.py:912
0%| | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 3615779.31it/s]
INFO make buckets train_util.py:935
INFO number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) train_util.py:981
INFO bucket 0: resolution (768, 1408), count: 1 train_util.py:986
INFO bucket 1: resolution (832, 1408), count: 1 train_util.py:986
INFO bucket 2: resolution (896, 1408), count: 7 train_util.py:986
INFO bucket 3: resolution (960, 1408), count: 13 train_util.py:986
INFO bucket 4: resolution (1024, 1408), count: 12 train_util.py:986
INFO bucket 5: resolution (1088, 1408), count: 12 train_util.py:986
INFO bucket 6: resolution (1152, 1408), count: 33 train_util.py:986
INFO bucket 7: resolution (1216, 1408), count: 9 train_util.py:986
INFO bucket 8: resolution (1280, 1408), count: 4 train_util.py:986
INFO bucket 9: resolution (1344, 1344), count: 49 train_util.py:986
INFO bucket 10: resolution (1408, 512), count: 1 train_util.py:986
INFO bucket 11: resolution (1408, 576), count: 1 train_util.py:986
INFO bucket 12: resolution (1408, 640), count: 4 train_util.py:986
INFO bucket 13: resolution (1408, 704), count: 7 train_util.py:986
INFO bucket 14: resolution (1408, 768), count: 11 train_util.py:986
INFO bucket 15: resolution (1408, 832), count: 5 train_util.py:986
INFO bucket 16: resolution (1408, 896), count: 10 train_util.py:986
INFO bucket 17: resolution (1408, 960), count: 7 train_util.py:986
INFO bucket 18: resolution (1408, 1024), count: 10 train_util.py:986
INFO bucket 19: resolution (1408, 1088), count: 9 train_util.py:986
INFO bucket 20: resolution (1408, 1152), count: 9 train_util.py:986
INFO bucket 21: resolution (1408, 1216), count: 4 train_util.py:986
INFO bucket 22: resolution (1408, 1280), count: 6 train_util.py:986
INFO mean ar error (without repeats): 0.017008187262922816 train_util.py:991
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:48
INFO prepare accelerator flux_train.py:173
wandb: Currently logged in as: kmac-mcfarlane (tngl). Use `wandb login --relogin` to force relogin
wandb: Appending key for api.wandb.ai to your netrc file: /home/rt/.netrc
2024-10-15 12:38:32 INFO Building AutoEncoder flux_utils.py:100
INFO Loading state dict from /home/rt/ai/models/stable-diffusion/vae/flux/ae.sft flux_utils.py:105
INFO Loaded AE: <All keys matched successfully> flux_utils.py:108
INFO [Dataset 0] train_util.py:2416
INFO caching latents with caching strategy. train_util.py:1037
INFO checking cache validity... train_util.py:1064
accelerator device: cuda
0%| | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 9877.01it/s]
INFO no latents to cache train_util.py:1107
/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-10-15 12:38:33 INFO Building CLIP flux_utils.py:113
INFO Loading state dict from /home/rt/ai/models/stable-diffusion/clip/sd3/clip_l.safetensors flux_utils.py:206
INFO Loaded CLIP: <All keys matched successfully> flux_utils.py:209
INFO Loading state dict from /home/rt/ai/models/stable-diffusion/clip/sd3/t5xxl_fp16.safetensors flux_utils.py:254
INFO Loaded T5xxl: <All keys matched successfully> flux_utils.py:257
2024-10-15 12:38:34 INFO [Dataset 0] train_util.py:2437
INFO caching Text Encoder outputs with caching strategy. train_util.py:1199
INFO checking cache validity... train_util.py:1205
0%| | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 6807.61it/s]
INFO no Text Encoder outputs to cache train_util.py:1227
INFO cache Text Encoder outputs for sample prompt: flux_train.py:236
/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/sample_prompt.toml
INFO cache Text Encoder outputs for prompt: psyart. A full-body view of a celestial goddess wearing flux_train.py:246
a crown made of vines in a meadow. She is wearing an ornate chest-piece with opal and silver
wire-wrapping. The sky is full of clouds and a sunburst forms a halo around her head. Fine
whisps of spore dust is floating in the air. There are beautiful trees surrounding the meadow.
The grass of the meddow is embedded with fractal tesselations and ancient rune patterns
overlayed.
INFO cache Text Encoder outputs for prompt: makeup, nude, naked, border, frame, signature, text, flux_train.py:246
watermark
INFO cache Text Encoder outputs for prompt: psyart. A photo of a female medicine woman standing in flux_train.py:246
front of a fire. Behind her ar ancient Aztek overgrown ruins. She is wearing an ornate
chest-piece adorned with jade and silver wire-wrapping. The night sky is clear and starry.
2024-10-15 12:38:35 INFO cache Text Encoder outputs for prompt: psyart. A closeup photo of a female mountain climber. flux_train.py:246
Behind her is a interdimentional celestial portal shaped like a vortex of lightning and smoke.
Majectic mountains loom in the background as snowfall and clouds descend on the mountain
range.
INFO cache Text Encoder outputs for prompt: psyart. A RAW photo of a woman facing towards the flux_train.py:246
camera in a prairie with a creek running through it. The sky is on fire with fractaled
patterns. The grass swirls to form abstract waves of color and ornate patterns.
INFO cache Text Encoder outputs for prompt: psyart. Landscape with a female deity surrounded by flux_train.py:246
wind and water. She is a metaphysical goddess surrounded by celestial and mystical symbology
and runes floating around her. She is in a sacred geometrical dreamscape.
INFO cache Text Encoder outputs for prompt: psyart. A photo of a magical forest goddess floating flux_train.py:246
above the ground glowing mycelium tendrils on the ground. She is surrounded by magical glowing
light and curls of smoke with colored fog. Her flowing long wavy brown hair has beautiful
flowers in the hair. She wears a flowing long white dress and is illuminated by vertical beam
of light. The moon is large in the background.
INFO cache Text Encoder outputs for prompt: psyart. A photograph of a mystical shaman wearing an flux_train.py:246
ornate headdress with opulent jewelry. There is haze and intricate curls of smoke in the
background. In the sky are stars and a bright glowing cosmic gateway opening for ascension to
the afterlife.
INFO cache Text Encoder outputs for prompt: A photo of a treefrog on a leaf. flux_train.py:246
INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:48
INFO Building Flux model schnell from BFL checkpoint flux_utils.py:74
INFO Loading state dict from flux_utils.py:81
/home/rt/ai/models/stable-diffusion/checkpoints/flux/nyanko7_flux-dev-de-distill.safetensors
INFO Loaded Flux: <All keys matched successfully> flux_utils.py:93
INFO enable block swap: blocks_to_swap=8 flux_train.py:291
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO use AdamWScheduleFree optimizer | {} train_util.py:4725
INFO using 58 optimizers for blockwise fused optimizers flux_train.py:365
FLUX: Gradient checkpointing enabled. CPU offload: True
number of trainable parameters: 11891178560
prepare optimizer, data loader etc.
block ('other', -1): 53895232 parameters
block ('double', 0): 339831296 parameters
block ('double', 1): 339831296 parameters
block ('double', 2): 339831296 parameters
block ('double', 3): 339831296 parameters
block ('double', 4): 339831296 parameters
block ('double', 5): 339831296 parameters
block ('double', 6): 339831296 parameters
block ('double', 7): 339831296 parameters
block ('double', 8): 339831296 parameters
block ('double', 9): 339831296 parameters
block ('double', 10): 339831296 parameters
block ('double', 11): 339831296 parameters
block ('double', 12): 339831296 parameters
block ('double', 13): 339831296 parameters
block ('double', 14): 339831296 parameters
block ('double', 15): 339831296 parameters
block ('double', 16): 339831296 parameters
block ('double', 17): 339831296 parameters
block ('double', 18): 339831296 parameters
block ('single', 0): 141591808 parameters
block ('single', 1): 141591808 parameters
block ('single', 2): 141591808 parameters
block ('single', 3): 141591808 parameters
block ('single', 4): 141591808 parameters
block ('single', 5): 141591808 parameters
block ('single', 6): 141591808 parameters
block ('single', 7): 141591808 parameters
block ('single', 8): 141591808 parameters
block ('single', 9): 141591808 parameters
block ('single', 10): 141591808 parameters
block ('single', 11): 141591808 parameters
block ('single', 12): 141591808 parameters
block ('single', 13): 141591808 parameters
block ('single', 14): 141591808 parameters
block ('single', 15): 141591808 parameters
block ('single', 16): 141591808 parameters
block ('single', 17): 141591808 parameters
block ('single', 18): 141591808 parameters
block ('single', 19): 141591808 parameters
block ('single', 20): 141591808 parameters
block ('single', 21): 141591808 parameters
block ('single', 22): 141591808 parameters
block ('single', 23): 141591808 parameters
block ('single', 24): 141591808 parameters
block ('single', 25): 141591808 parameters
block ('single', 26): 141591808 parameters
block ('single', 27): 141591808 parameters
block ('single', 28): 141591808 parameters
block ('single', 29): 141591808 parameters
block ('single', 30): 141591808 parameters
block ('single', 31): 141591808 parameters
block ('single', 32): 141591808 parameters
block ('single', 33): 141591808 parameters
block ('single', 34): 141591808 parameters
block ('single', 35): 141591808 parameters
block ('single', 36): 141591808 parameters
block ('single', 37): 141591808 parameters
Traceback (most recent call last):
File "/home/rt/ai/repos/kohya-ss/sd-scripts/flux_train.py", line 994, in <module>
train(args)
File "/home/rt/ai/repos/kohya-ss/sd-scripts/flux_train.py", line 368, in train
raise ValueError("Schedule-free optimizer is not supported with blockwise fused optimizers")
ValueError: Schedule-free optimizer is not supported with blockwise fused optimizers
Traceback (most recent call last):
File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/rt/ai/repos/kohya-ss/sd-scripts/venv/bin/python3.10', 'flux_train.py', '--config_file=/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file.toml', '--wandb_api_key=8ee3ab96add0fcf0e1c452f693bce045103de391']' returned non-zero exit status 1.
When I try to use
--blockwise_fused_optimizers
withAdamWScheduleFree
it simple errors sayingSchedule-free optimizer is not supported with blockwise fused optimizers
. I'm setting--blocks_to_swap 8
in my config file.
Oh, sorry, as I recall, AdamWScheduleFree optimizer needs to be called train/eval as needed, and the blockwise_fused_optimizers
doesn't support train/eval calling.
So, it is needed about 30GB VRAM to use AdamWScheduleFree.
And sorry again, this is the VRAM requirement when using AdaFactor with fused optimizer and no block swap. AdamWScheduleFree may require more.
I should note, I'm on commit
d005652
because of issues with the last few commits preventing training from starting.
This should be fixed now.