bghira/SimpleTuner

Exception: No images were discovered by the bucket manager in the dataset

Closed this issue · 9 comments

My dataset is a single image:
dataset.zip
Settings:
s01_multidatabackend.json
s01_config_01.json
Log:

No dependencies to install or update
INFO:root:lm_eval is not installed, GPTQ may not be usable
/home/alexds9/Documents/stable_diffusion/SimpleTuner/.venv/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/alexds9/Documents/stable_diffusion/SimpleTuner/.venv/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
2024-10-08 04:09:41,385 [INFO] Using json configuration backend.
2024-10-08 04:09:41,385 [INFO] [CONFIG.JSON] Loaded configuration from config/config.json
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --push_to_hub
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --push_checkpoints_to_hub
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --validation_torch_compile
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --disable_benchmark
--model_type=lora
--lora_type=lycoris
--lycoris_config=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/lycoris_config_03.json
--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev
--model_family=flux
--data_backend_config=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_multidatabackend.json
--output_dir=/home/alexds9/stable-diffusion-webui/models/Lora/My/Flux/Training/Models_2024_10/simple_image_test/tr_01/
--user_prompt_library=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_prompt_library.json
--hub_model_id=simpletuner-lora-01_simple_image_test_tr_01
--tracker_project_name=simpletuner-lora-01_simple_image_test_tr_01
--tracker_run_name=tr_01
--seed=5612103
--lora_rank=8
--lora_alpha=8
--mixed_precision=bf16
--optimizer=adamw_bf16
--learning_rate=7.5e-3
--train_batch_size=2
--gradient_accumulation_steps=2
--lr_scheduler=cosine
--lr_warmup_steps=20
--max_train_steps=1000
--num_train_epochs=0
--checkpointing_steps=100
--base_model_precision=int8-quanto
--base_model_default_dtype=bf16
--keep_vae_loaded
--flux_lora_target=all+ffs
--gradient_precision=fp32
--noise_offset=0.15
--noise_offset_probability=0.5
--checkpoints_total_limit=20
--aspect_bucket_rounding=2
--minimum_image_size=0
--resume_from_checkpoint=latest
--report_to=wandb
--metadata_update_interval=60
--gradient_checkpointing
--caption_dropout_probability=0.20
--resolution_type=pixel_area
--resolution=256
--validation_seed=10
--validation_steps=100
--validation_resolution=512x768
--validation_guidance=3.5
--validation_guidance_rescale=0.0
--validation_num_inference_steps=20
--validation_prompt=woman, brown hair, blue eyes, white shirt, upper body, indoors,
--num_validation_images=1
--snr_gamma=5
--inference_scheduler_timestep_spacing=trailing
--training_scheduler_timestep_spacing=trailing
--max_workers=32
--read_batch_size=25
--write_batch_size=64
--torch_num_threads=8
--image_processing_batch_size=32
--vae_batch_size=4
--compress_disk_cache
--max_grad_norm=0.02
--disable_bucket_pruning
--override_dataset_config
--quantize_via=cpu
2024-10-08 04:09:41,390 [WARNING] The VAE model madebyollin/sdxl-vae-fp16-fix is not compatible. Please use a compatible VAE to eliminate this warning. The baked-in VAE will be used, instead.
2024-10-08 04:09:41,391 [INFO] VAE Model: black-forest-labs/FLUX.1-dev
2024-10-08 04:09:41,391 [INFO] Default VAE Cache location: 
2024-10-08 04:09:41,391 [INFO] Text Cache location: cache
2024-10-08 04:09:41,391 [WARNING] Updating T5 XXL tokeniser max length to 512 for Flux.
2024-10-08 04:09:41,391 [WARNING] Gradient accumulation steps are enabled, but gradient precision is set to 'unmodified'. This may lead to numeric instability. Consider disabling gradient accumulation steps. Continuing in 10 seconds..
2024-10-08 04:09:51,391 [INFO] Enabled NVIDIA TF32 for faster training on Ampere GPUs. Use --disable_tf32 if this causes any problems.
2024-10-08 04:09:51,912 [INFO] Load VAE: black-forest-labs/FLUX.1-dev
2024-10-08 04:09:52,464 [INFO] Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16
2024-10-08 04:09:52,603 [INFO] Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-10-08 04:09:53,495 [INFO] Loading OpenAI CLIP-L text encoder from black-forest-labs/FLUX.1-dev/text_encoder..
2024-10-08 04:09:53,895 [INFO] Loading T5 XXL v1.1 text encoder from black-forest-labs/FLUX.1-dev/text_encoder_2..
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7876.63it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.17it/s]
2024-10-08 04:09:57,514 [INFO] Moving text encoder to GPU.
2024-10-08 04:09:57,707 [INFO] Moving text encoder 2 to GPU.
2024-10-08 04:10:06,404 [INFO] Loading data backend config from /home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_multidatabackend.json
2024-10-08 04:10:06,405 [INFO] Configuring text embed backend: alt-embed-cache
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 593.91it/s]
2024-10-08 04:10:06,757 [INFO] (Rank: 0) (id=alt-embed-cache) Listing all text embed cache entries
2024-10-08 04:10:06,758 [INFO] Pre-computing null embedding
2024-10-08 04:10:13,232 [INFO] Completed loading text embed services.                                                        
2024-10-08 04:10:13,232 [INFO] Configuring data backend: all_dataset_768
2024-10-08 04:10:13,232 [INFO] (id=all_dataset_768) Loading bucket manager.                                                  
2024-10-08 04:10:13,243 [WARNING] No cache file found, creating new one.
2024-10-08 04:10:13,243 [INFO] (id=all_dataset_768) Refreshing aspect buckets on main process.
2024-10-08 04:10:13,243 [INFO] Discovering new files...
2024-10-08 04:10:13,245 [INFO] Compressed 0 existing files from 0.
Generating aspect bucket cache:   0%|                                         | 0/1 [00:00<?, ?it/s]2024-10-08 04:10:13,267 [ERROR] Error processing image: Aspect buckets must be a list of floats or dictionaries.
2024-10-08 04:10:13,268 [ERROR] Error traceback: Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/metadata/backends/discovery.py", line 237, in _process_for_bucket
    prepared_sample = training_sample.prepare()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 314, in prepare
    self.crop()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 529, in crop
    self.calculate_target_size()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 484, in calculate_target_size
    self.aspect_ratio = self._select_random_aspect()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 280, in _select_random_aspect
    available_aspects = self._trim_aspect_bucket_list()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 198, in _trim_aspect_bucket_list
    raise ValueError(
ValueError: Aspect buckets must be a list of floats or dictionaries.

2024-10-08 04:10:13,270 [INFO] Image processing statistics: {'total_processed': 0, 'skipped': {'already_exists': 0, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-08 04:10:13,270 [INFO] Enforcing minimum image size of 0.013225. This could take a while for very-large datasets.
2024-10-08 04:10:13,270 [INFO] Completed aspect bucket update.
2024-10-08 04:10:13,271 [INFO] Configured backend: {'id': 'all_dataset_768', 'config': {'vae_cache_clear_each_epoch': False, 'probability': 1.0, 'repeats': 5, 'crop': True, 'crop_aspect': 'random', 'crop_aspect_buckets': [0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.125, 1.25, 1.375, 1.5, 1.625, 1.75, 1.875, 2], 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.589824, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/dataset', 'maximum_image_size': 1.048576, 'target_downsample_size': 0.589824, 'config_version': 2}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x732528accc90>, 'instance_data_dir': '/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/dataset', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x732528a8a690>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
2024-10-08 04:10:13,272 [ERROR] No images were discovered by the bucket manager in the dataset: all_dataset_768., traceback: Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 605, in init_data_backend
    configure_multi_databackend(
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/data_backend/factory.py", line 823, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: all_dataset_768.

No images were discovered by the bucket manager in the dataset: all_dataset_768.
Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/train.py", line 30, in <module>
    trainer.init_data_backend()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 631, in init_data_backend
    raise e
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 605, in init_data_backend
    configure_multi_databackend(
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/data_backend/factory.py", line 823, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: all_dataset_768.

The dataset include a single image, and it uses two settings for resolution: 512px and 768px. Fot 768px, it seems to remove the file and assumes there is nothing to train.

I don't use "--delete_problematic_images" or "--delete_unwanted_images". I cleared cache and all json files that script generated from dataset and output folder, and tried again - and it crashed again.
When I used: "crop": false - the image was discovered - and training was working, so it seems that crop option is deleting the image and causing the issue.

clearly the system is haunted by poltergeist

The image size: 852 × 480
Crop settings that cased the image to be deleted:

        "crop": true,
        "crop_style": "random",
        "crop_aspect": "random",
        "crop_aspect_buckets": [0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875, 1.0, 1.125, 1.250, 1.375, 1.500, 1.625, 1.750, 1.875, 2],
        "resolution": 768,
        "resolution_type": "pixel_area",
        "minimum_image_size": 115,
        "maximum_image_size": 1024,
        "target_downsample_size": 768,

I removed 768px resolution from dataset settings and tried only with 512px, it crashed with similar error for 512px:

[ERROR] No images were discovered by the bucket manager in the dataset: all_dataset_512., traceback: Traceback

So crop option removing the image even for smaller resolution.

@bghira
The problem was caused by having an integer in crop_aspect_buckets without a decimal point.
For example, you can reproduce the problem with: "crop_aspect_buckets": [1],
But if you change it to 1.0 - it will work: "crop_aspect_buckets": [1.0],

thanks for figuring that part out. i looked into the file deletions and really every call to data_backend.delete(...) is wrapped by a check for delete_problematic_images etc so those might be lurking somewhere?

Thank you. I meant to say that the image was deleted from the list of recognized/used images, not from the file system itself. So there is no problem in this regard.

oh, that is a relief