Flux AdamWScheduleFree on 24GB

Question

Flux AdamWScheduleFree on 24GB

Closed this issue 2 months ago · 4 comments

I am trying to use AdamWScheduleFree on a flux finetune. I've loved the results I get with LORA training using this optimizer, but I haven't found a set of settings for my 4090 that don't result in OOM errors.

It appears this is because fused_backward_pass is not supported, and I can't think of any other settings to reduce memory usage further. My dataset consists of 225 images that are close to 1024x1024 (similar total pixel area to a 1024x1024 square). My training resolution is 1024.

Do I need to run this in a cloud service in order to get more available VRAM? I'm already running on linux headless, so there's nothing using VRAM except the training script.

[common]
learning_rate = 1e-5
output_name = "psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword"
log_tracker_name = "psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword"
sample_dir = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/psyart-v0.3.0-nyanko_flux_dev_de_distilled-adamw-schedule-free-1e-5-100-epochs-full_bf16-with-keyword/samples"
sample_every_n_epochs = 5
save_every_n_epochs = 5
max_train_epochs = 100
save_state_on_train_end = true
caption_prefix = "psyart. "
max_bucket_reso = 1360

[model]
pretrained_model_name_or_path = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/nyanko7_flux-dev-de-distill.safetensors"
clip_l = "/home/rt/ai/models/stable-diffusion/clip/sd3/clip_l.safetensors"
t5xxl = "/home/rt/ai/models/stable-diffusion/clip/sd3/t5xxl_fp16.safetensors"
ae = "/home/rt/ai/models/stable-diffusion/vae/flux/ae.sft"
save_model_as = "safetensors" 
output_dir = "/home/rt/ai/models/stable-diffusion/checkpoints/flux/psyart-training-de-distilled"

[dataset]
dataset_config = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/dataset_config.toml"
persistent_data_loader_workers = true
max_data_loader_n_workers = 2
seed = 42069

[training]
full_bf16 = true
mixed_precision = "bf16"
save_precision = "bf16"
timestep_sampling = "shift"
model_prediction_type = "raw"
#max_grad_norm = 0.0 
guidance_scale = 1.0
discrete_flow_shift = 3.1582
scale_weight_norms = 3
#masked_loss = true
#apply_t5_attn_mask = true # seems to cause OOM errors on 24GB?
#alpha_mask = true

[optimizer]
optimizer_type = "AdamWScheduleFree"
shuffle = true # necessary for using non-linear LR with low epochs

[output]
cpu_offload_checkpointing = true

[memory_optimization]
sdpa = true
gradient_checkpointing = true
highvram = true
cache_text_encoder_outputs_to_disk = true
cache_latents_to_disk = true
#fused_backward_pass = true
#blockwise_fused_optimizers = true
#blocks_to_swap = 36

[logging_arguments]
log_with = "wandb"
logging_dir = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/logs"

[sample_prompt_arguments]
sample_sampler = "k_dpm_2"
sample_prompts = "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/sample_prompt.toml"

Answer 1 · 2024-10-15T03:41:32.000Z

You may use AdamWScheduleFree with --blockwise_fused_optimizers option instead of --fused_backward_pass (not tested). However, I do not recommend blockwise_fused_optimizers because it doesn't support stochastic rounding.

So, it is needed about 30GB VRAM to use AdamWScheduleFree.

Answer 2 · 2024-10-15T14:00:15.000Z

I did attempt to start training with --blockwise_fused_optimizers but it gave an error (I can't reproduce at the moment, as I have another training running currently).

Good to know that full-precision fine-tuning needs 30GB with that optimizer. I'll try running with vast.ai at some point in that case.

Answer 3 · 2024-10-15T17:42:17.000Z

I should note, I'm on commit d005652 because of issues with the last few commits preventing training from starting.

When I try to use --blockwise_fused_optimizers with AdamWScheduleFree it simple errors saying Schedule-free optimizer is not supported with blockwise fused optimizers. I'm setting --blocks_to_swap 8 in my config file.

Logs:

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/diffusers/utils/outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  torch.utils._pytree._register_pytree_node(
2024-10-15 12:38:30 INFO     Loading settings from                                                                         train_util.py:4361
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file.toml...                        
                    INFO     /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file              train_util.py:4380
                    WARNING  cache_latents_to_disk is enabled, so cache_latents is also enabled /                          train_util.py:4055
                             cache_latents_to_diskが有効なため、cache_latentsを有効にします                                                  
2024-10-15 12:38:30 WARNING  cache_text_encoder_outputs_to_disk is enabled, so cache_text_encoder_outputs is also enabled /  flux_train.py:64
                             cache_text_encoder_outputs_to_diskが有効になっているため、cache_text_encoder_outputsも有効にな                  
                             ります                                                                                                          
                    INFO     Load dataset config from                                                                        flux_train.py:92
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/dataset_config.toml                        
                    WARNING  max_bucket_reso is adjusted to be multiple of bucket_reso_steps /                              train_util.py:680
                             max_bucket_resoがbucket_reso_stepsの倍数になるように調整されました: 1360 -> 1408                                
                    INFO     prepare images.                                                                               train_util.py:1892
                    INFO     get image size from name of cache files                                                       train_util.py:1830
highvram is enabled / highvramが有効です
  0%|          | 0/225 [00:00<?, ?it/s]
 56%|█████▋    | 127/225 [00:00<00:00, 1265.47it/s]
100%|██████████| 225/225 [00:00<00:00, 1302.16it/s]
2024-10-15 12:38:31 INFO     set image size from cache files: 225/225                                                      train_util.py:1837
                    INFO     found directory                                                                               train_util.py:1839
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data contains 225                    
                             image files                                                                                                     
read caption:   0%|          | 0/225 [00:00<?, ?it/s]                    WARNING  neither caption file nor class tokens are found. use empty caption for                        train_util.py:1851
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538                   
                             03468121175_6452388726452820497_o.jpg / キャプションファイルもclass                                             
                             tokenも見つかりませんでした。空のキャプションを使用します:                                                      
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538                   
                             03468121175_6452388726452820497_o.jpg                                                                           
                    WARNING  neither caption file nor class tokens are found. use empty caption for                        train_util.py:1851
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3.                   
                             png / キャプションファイルもclass tokenも見つかりませんでした。空のキャプションを使用します:                    
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3.                   
                             png                                                                                                             
read caption: 100%|██████████| 225/225 [00:00<00:00, 29182.96it/s]
                    WARNING  No caption file found for 2 images. Training will continue without captions for these images. train_util.py:1870
                             If class token exists, it will be used. /                                                                       
                             2枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなし                   
                             で学習を続行します。class tokenが存在する場合はそれを使います。                                                 
                    WARNING  /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/11703260_5538 train_util.py:1877
                             03468121175_6452388726452820497_o.jpg                                                                           
                    WARNING  /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data/cropped-v0.3. train_util.py:1877
                             png                                                                                                             
                    INFO     225 train images with repeating.                                                              train_util.py:1933
                    INFO     0 reg images.                                                                                 train_util.py:1936
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                   train_util.py:1941
                    INFO     [Dataset 0]                                                                                   config_util.py:570
                               batch_size: 1                                                                                                 
                               resolution: (1360, 1360)                                                                                      
                               enable_bucket: True                                                                                           
                               network_multiplier: 1.0                                                                                       
                               min_bucket_reso: 256                                                                                          
                               max_bucket_reso: 1408                                                                                         
                               bucket_reso_steps: 64                                                                                         
                               bucket_no_upscale: False                                                                                      
                                                                                                                                             
                               [Subset 0 of Dataset 0]                                                                                       
                                 image_dir:                                                                                                  
                             "/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/train_data"                               
                                 image_count: 225                                                                                            
                                 num_repeats: 1                                                                                              
                                 shuffle_caption: False                                                                                      
                                 keep_tokens: 0                                                                                              
                                 keep_tokens_separator:                                                                                      
                                 caption_separator: ,                                                                                        
                                 secondary_separator: None                                                                                   
                                 enable_wildcard: False                                                                                      
                                 caption_dropout_rate: 0.0                                                                                   
                                 caption_dropout_every_n_epoches: 0                                                                          
                                 caption_tag_dropout_rate: 0.0                                                                               
                                 caption_prefix: psyart.                                                                                     
                                 caption_suffix: None                                                                                        
                                 color_aug: False                                                                                            
                                 flip_aug: False                                                                                             
                                 face_crop_aug_range: None                                                                                   
                                 random_crop: False                                                                                          
                                 token_warmup_min: 1,                                                                                        
                                 token_warmup_step: 0,                                                                                       
                                 alpha_mask: False,                                                                                          
                                 is_reg: False                                                                                               
                                 class_tokens: None                                                                                          
                                 caption_extension: .caption                                                                                 
                                                                                                                                             
                                                                                                                                             
                    INFO     [Dataset 0]                                                                                   config_util.py:576
                    INFO     loading image sizes.                                                                           train_util.py:912
  0%|          | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 3615779.31it/s]
                    INFO     make buckets                                                                                   train_util.py:935
                    INFO     number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）                train_util.py:981
                    INFO     bucket 0: resolution (768, 1408), count: 1                                                     train_util.py:986
                    INFO     bucket 1: resolution (832, 1408), count: 1                                                     train_util.py:986
                    INFO     bucket 2: resolution (896, 1408), count: 7                                                     train_util.py:986
                    INFO     bucket 3: resolution (960, 1408), count: 13                                                    train_util.py:986
                    INFO     bucket 4: resolution (1024, 1408), count: 12                                                   train_util.py:986
                    INFO     bucket 5: resolution (1088, 1408), count: 12                                                   train_util.py:986
                    INFO     bucket 6: resolution (1152, 1408), count: 33                                                   train_util.py:986
                    INFO     bucket 7: resolution (1216, 1408), count: 9                                                    train_util.py:986
                    INFO     bucket 8: resolution (1280, 1408), count: 4                                                    train_util.py:986
                    INFO     bucket 9: resolution (1344, 1344), count: 49                                                   train_util.py:986
                    INFO     bucket 10: resolution (1408, 512), count: 1                                                    train_util.py:986
                    INFO     bucket 11: resolution (1408, 576), count: 1                                                    train_util.py:986
                    INFO     bucket 12: resolution (1408, 640), count: 4                                                    train_util.py:986
                    INFO     bucket 13: resolution (1408, 704), count: 7                                                    train_util.py:986
                    INFO     bucket 14: resolution (1408, 768), count: 11                                                   train_util.py:986
                    INFO     bucket 15: resolution (1408, 832), count: 5                                                    train_util.py:986
                    INFO     bucket 16: resolution (1408, 896), count: 10                                                   train_util.py:986
                    INFO     bucket 17: resolution (1408, 960), count: 7                                                    train_util.py:986
                    INFO     bucket 18: resolution (1408, 1024), count: 10                                                  train_util.py:986
                    INFO     bucket 19: resolution (1408, 1088), count: 9                                                   train_util.py:986
                    INFO     bucket 20: resolution (1408, 1152), count: 9                                                   train_util.py:986
                    INFO     bucket 21: resolution (1408, 1216), count: 4                                                   train_util.py:986
                    INFO     bucket 22: resolution (1408, 1280), count: 6                                                   train_util.py:986
                    INFO     mean ar error (without repeats): 0.017008187262922816                                          train_util.py:991
                    INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                       flux_utils.py:48
                    INFO     prepare accelerator                                                                            flux_train.py:173
wandb: Currently logged in as: kmac-mcfarlane (tngl). Use `wandb login --relogin` to force relogin
wandb: Appending key for api.wandb.ai to your netrc file: /home/rt/.netrc
2024-10-15 12:38:32 INFO     Building AutoEncoder                                                                           flux_utils.py:100
                    INFO     Loading state dict from /home/rt/ai/models/stable-diffusion/vae/flux/ae.sft                    flux_utils.py:105
                    INFO     Loaded AE: <All keys matched successfully>                                                     flux_utils.py:108
                    INFO     [Dataset 0]                                                                                   train_util.py:2416
                    INFO     caching latents with caching strategy.                                                        train_util.py:1037
                    INFO     checking cache validity...                                                                    train_util.py:1064
accelerator device: cuda
  0%|          | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 9877.01it/s]
                    INFO     no latents to cache                                                                           train_util.py:1107
/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-10-15 12:38:33 INFO     Building CLIP                                                                                  flux_utils.py:113
                    INFO     Loading state dict from /home/rt/ai/models/stable-diffusion/clip/sd3/clip_l.safetensors        flux_utils.py:206
                    INFO     Loaded CLIP: <All keys matched successfully>                                                   flux_utils.py:209
                    INFO     Loading state dict from /home/rt/ai/models/stable-diffusion/clip/sd3/t5xxl_fp16.safetensors    flux_utils.py:254
                    INFO     Loaded T5xxl: <All keys matched successfully>                                                  flux_utils.py:257
2024-10-15 12:38:34 INFO     [Dataset 0]                                                                                   train_util.py:2437
                    INFO     caching Text Encoder outputs with caching strategy.                                           train_util.py:1199
                    INFO     checking cache validity...                                                                    train_util.py:1205
  0%|          | 0/225 [00:00<?, ?it/s]
100%|██████████| 225/225 [00:00<00:00, 6807.61it/s]
                    INFO     no Text Encoder outputs to cache                                                              train_util.py:1227
                    INFO     cache Text Encoder outputs for sample prompt:                                                  flux_train.py:236
                             /home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/sample_prompt.toml                         
                    INFO     cache Text Encoder outputs for prompt: psyart. A full-body view of a celestial goddess wearing flux_train.py:246
                             a crown made of vines in a meadow. She is wearing an ornate chest-piece with opal and silver                    
                             wire-wrapping. The sky is full of clouds and a sunburst forms a halo around her head. Fine                      
                             whisps of spore dust is floating in the air. There are beautiful trees surrounding the meadow.                  
                             The grass of the meddow is embedded with fractal tesselations and ancient rune patterns                         
                             overlayed.                                                                                                      
                    INFO     cache Text Encoder outputs for prompt: makeup, nude, naked, border, frame, signature, text,    flux_train.py:246
                             watermark                                                                                                       
                    INFO     cache Text Encoder outputs for prompt: psyart. A photo of a female medicine woman standing in  flux_train.py:246
                             front of a fire. Behind her ar ancient Aztek overgrown ruins. She is wearing an ornate                          
                             chest-piece adorned with jade and silver wire-wrapping. The night sky is clear and starry.                      
2024-10-15 12:38:35 INFO     cache Text Encoder outputs for prompt: psyart. A closeup photo of a female mountain climber.   flux_train.py:246
                             Behind her is a interdimentional celestial portal shaped like a vortex of lightning and smoke.                  
                             Majectic mountains loom in the background as snowfall and clouds descend on the mountain                        
                             range.                                                                                                          
                    INFO     cache Text Encoder outputs for prompt: psyart. A RAW photo of a woman facing towards the       flux_train.py:246
                             camera in a prairie with a creek running through it. The sky is on fire with fractaled                          
                             patterns. The grass swirls to form abstract waves of color and ornate patterns.                                 
                    INFO     cache Text Encoder outputs for prompt: psyart. Landscape with a female deity surrounded by     flux_train.py:246
                             wind and water. She is a metaphysical goddess surrounded by celestial and mystical symbology                    
                             and runes floating around her. She is in a sacred geometrical dreamscape.                                       
                    INFO     cache Text Encoder outputs for prompt: psyart. A photo of a magical forest goddess floating    flux_train.py:246
                             above the ground glowing mycelium tendrils on the ground. She is surrounded by magical glowing                  
                             light and curls of smoke with colored fog. Her flowing long wavy brown hair has beautiful                       
                             flowers in the hair. She wears a flowing long white dress and is illuminated by vertical beam                   
                             of light. The moon is large in the background.                                                                  
                    INFO     cache Text Encoder outputs for prompt: psyart. A photograph of a mystical shaman wearing an    flux_train.py:246
                             ornate headdress with opulent jewelry. There is haze and intricate curls of smoke in the                        
                             background. In the sky are stars and a bright glowing cosmic gateway opening for ascension to                   
                             the afterlife.                                                                                                  
                    INFO     cache Text Encoder outputs for prompt: A photo of a treefrog on a leaf.                        flux_train.py:246
                    INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                       flux_utils.py:48
                    INFO     Building Flux model schnell from BFL checkpoint                                                 flux_utils.py:74
                    INFO     Loading state dict from                                                                         flux_utils.py:81
                             /home/rt/ai/models/stable-diffusion/checkpoints/flux/nyanko7_flux-dev-de-distill.safetensors                    
                    INFO     Loaded Flux: <All keys matched successfully>                                                    flux_utils.py:93
                    INFO     enable block swap: blocks_to_swap=8                                                            flux_train.py:291
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     use AdamWScheduleFree optimizer | {}                                                          train_util.py:4725
                    INFO     using 58 optimizers for blockwise fused optimizers                                             flux_train.py:365
FLUX: Gradient checkpointing enabled. CPU offload: True
number of trainable parameters: 11891178560
prepare optimizer, data loader etc.
block ('other', -1): 53895232 parameters
block ('double', 0): 339831296 parameters
block ('double', 1): 339831296 parameters
block ('double', 2): 339831296 parameters
block ('double', 3): 339831296 parameters
block ('double', 4): 339831296 parameters
block ('double', 5): 339831296 parameters
block ('double', 6): 339831296 parameters
block ('double', 7): 339831296 parameters
block ('double', 8): 339831296 parameters
block ('double', 9): 339831296 parameters
block ('double', 10): 339831296 parameters
block ('double', 11): 339831296 parameters
block ('double', 12): 339831296 parameters
block ('double', 13): 339831296 parameters
block ('double', 14): 339831296 parameters
block ('double', 15): 339831296 parameters
block ('double', 16): 339831296 parameters
block ('double', 17): 339831296 parameters
block ('double', 18): 339831296 parameters
block ('single', 0): 141591808 parameters
block ('single', 1): 141591808 parameters
block ('single', 2): 141591808 parameters
block ('single', 3): 141591808 parameters
block ('single', 4): 141591808 parameters
block ('single', 5): 141591808 parameters
block ('single', 6): 141591808 parameters
block ('single', 7): 141591808 parameters
block ('single', 8): 141591808 parameters
block ('single', 9): 141591808 parameters
block ('single', 10): 141591808 parameters
block ('single', 11): 141591808 parameters
block ('single', 12): 141591808 parameters
block ('single', 13): 141591808 parameters
block ('single', 14): 141591808 parameters
block ('single', 15): 141591808 parameters
block ('single', 16): 141591808 parameters
block ('single', 17): 141591808 parameters
block ('single', 18): 141591808 parameters
block ('single', 19): 141591808 parameters
block ('single', 20): 141591808 parameters
block ('single', 21): 141591808 parameters
block ('single', 22): 141591808 parameters
block ('single', 23): 141591808 parameters
block ('single', 24): 141591808 parameters
block ('single', 25): 141591808 parameters
block ('single', 26): 141591808 parameters
block ('single', 27): 141591808 parameters
block ('single', 28): 141591808 parameters
block ('single', 29): 141591808 parameters
block ('single', 30): 141591808 parameters
block ('single', 31): 141591808 parameters
block ('single', 32): 141591808 parameters
block ('single', 33): 141591808 parameters
block ('single', 34): 141591808 parameters
block ('single', 35): 141591808 parameters
block ('single', 36): 141591808 parameters
block ('single', 37): 141591808 parameters
Traceback (most recent call last):
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/flux_train.py", line 994, in <module>
    train(args)
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/flux_train.py", line 368, in train
    raise ValueError("Schedule-free optimizer is not supported with blockwise fused optimizers")
ValueError: Schedule-free optimizer is not supported with blockwise fused optimizers
Traceback (most recent call last):
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "/home/rt/ai/repos/kohya-ss/sd-scripts/venv/lib64/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/rt/ai/repos/kohya-ss/sd-scripts/venv/bin/python3.10', 'flux_train.py', '--config_file=/home/rt/ai/training/stable-diffusion/psyart/kohya-flux/de-distilled/config_file.toml', '--wandb_api_key=8ee3ab96add0fcf0e1c452f693bce045103de391']' returned non-zero exit status 1.

Answer 4 · 2024-10-15T22:36:39.000Z

When I try to use --blockwise_fused_optimizers with AdamWScheduleFree it simple errors saying Schedule-free optimizer is not supported with blockwise fused optimizers. I'm setting --blocks_to_swap 8 in my config file.

Oh, sorry, as I recall, AdamWScheduleFree optimizer needs to be called train/eval as needed, and the blockwise_fused_optimizers
doesn't support train/eval calling.

So, it is needed about 30GB VRAM to use AdamWScheduleFree.

And sorry again, this is the VRAM requirement when using AdaFactor with fused optimizer and no block swap. AdamWScheduleFree may require more.

I should note, I'm on commit d005652 because of issues with the last few commits preventing training from starting.

This should be fixed now.