hollowstrawberry/kohya-colab

HELP PLS!

Closed this issue · 8 comments

Hi, I'm trying to train a Lora with the normal Lora_Trainer (not the XL) and im not being able to start the training process. When it loads the dataset, Im getting the following error and I can't fix it. Im having this error for several days.

loading image sizes.
100% 594/594 [00:02<00:00, 256.68it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (384, 640), count: 585
bucket 1: resolution (448, 576), count: 9
mean ar error (without repeats): 0.0668744443912073
preparing accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint: /content/downloaded_model.safetensors
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /content/kohya-trainer/train_network.py:873 in │
│ │
│ 870 │ args = parser.parse_args() │
│ 871 │ args = train_util.read_config_from_file(args, parser) │
│ 872 │ │
│ ❱ 873 │ train(args) │
│ 874 │
│ │
│ /content/kohya-trainer/train_network.py:168 in train │
│ │
│ 165 │ weight_dtype, save_dtype = train_util.prepare_dtype(args) │
│ 166 │ │
│ 167 │ # モデルを読み込む │
│ ❱ 168 │ text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accele │
│ 169 │ │
│ 170 │ # モデルに xformers とか memory efficient attention を組み込む │
│ 171 │ train_util.replace_unet_modules(unet, args.mem_eff_attn, args.xformers) │
│ │
│ /content/kohya-trainer/library/train_util.py:3150 in load_target_model │
│ │
│ 3147 │ │ if pi == accelerator.state.local_process_index: │
│ 3148 │ │ │ print(f"loading model for process {accelerator.state.local_process_index}/{a │
│ 3149 │ │ │ │
│ ❱ 3150 │ │ │ text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( │
│ 3151 │ │ │ │ args, weight_dtype, accelerator.device if args.lowram else "cpu" │
│ 3152 │ │ │ ) │
│ 3153 │
│ │
│ /content/kohya-trainer/library/train_util.py:3116 in _load_target_model │
│ │
│ 3113 │ load_stable_diffusion_format = os.path.isfile(name_or_path) # determine SD or Diffu │
│ 3114 │ if load_stable_diffusion_format: │
│ 3115 │ │ print(f"load StableDiffusion checkpoint: {name_or_path}") │
│ ❱ 3116 │ │ text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoin │
│ 3117 │ else: │
│ 3118 │ │ # Diffusers model is loaded to CPU │
│ 3119 │ │ print(f"load Diffusers pretrained models: {name_or_path}") │
│ │
│ /content/kohya-trainer/library/model_util.py:863 in load_models_from_stable_diffusion_checkpoint │
│ │
│ 860 │ converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config) │
│ 861 │ │
│ 862 │ unet = UNet2DConditionModel(**unet_config).to(device) │
│ ❱ 863 │ info = unet.load_state_dict(converted_unet_checkpoint) │
│ 864 │ print("loading u-net:", info) │
│ 865 │ │
│ 866 │ # Convert the VAE model. │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2152 in load_state_dict │
│ │
│ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │
│ 2150 │ │ │
│ 2151 │ │ if len(error_msgs) > 0: │
│ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │
│ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │
│ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │
│ 2155 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.1.proj_in.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.1.proj_out.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying
a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for up_blocks.2.attentions.0.proj_in.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.0.proj_out.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.1.proj_in.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.1.proj_out.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.2.proj_in.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is
torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.2.proj_out.weight: copying a param with shape
torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.3.attentions.0.proj_in.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.0.proj_out.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.1.proj_in.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.1.proj_out.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.2.proj_in.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is
torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.2.proj_out.weight: copying a param with shape
torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a
param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is
torch.Size([1280, 768]).
size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape
torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1,
1]).
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/bin/accelerate:8 in │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'train_network.py',
'--dataset_config=/content/drive/MyDrive/Loras/MoviePoster/dataset_config.toml',
'--config_file=/content/drive/MyDrive/Loras/MoviePoster/training_config.toml']' returned non-zero
exit status 1.

Are you using a custom model url? By what you posted it looks like it's faulting when it's trying to load the model not the images.

Are you using a custom model url? By what you posted it looks like it's faulting when it's trying to load the model not the images.

Yes, im using a model URL from huggingface

Well what ever is happening here has to do with that model or the Url your trying to feed it. Most likely the url. Are you giving it the direct download url or the page url?

Better yet, post the url your using here.

here you have:

https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-nonema-pruned.safetensors?download=true

Hrm. It's the direct download link. Is this a SD2.x model? and if so are you checking the box that says that the model is a 2.x model?

image

No. You just have to check a box under the model input. See my image above.

My bad, I didnt saw that checkbox, sorry.
It's working, 1st epoch is loading!
Thanks you!!!!!!!!

My bad, I didnt saw that checkbox, sorry. It's working, 1st epoch is loading! Thanks you!!!!!!!!

No problem. Glad it's working for you now.