LarryJane491/Lora-Training-in-Comfy

SomeError

HevenHeHe opened this issue · 2 comments

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

Traceback (most recent call last):
File "D:\AI\ϰComfyUI_windows_2024p2\ComfyUI_windows_2024\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 26, in

from diffusers import DDPMScheduler

File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers_init_.py", line 3, in
from .configuration_utils import ConfigMixin
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\configuration_utils.py", line 34, in
from .utils import DIFFUSERS_CACHE, HUGGINGFACE_CO_RESOLVE_ENDPOINT, DummyObject, deprecate, logging
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\utils_init_.py", line 22, in
from .import_utils import (
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\utils\import_utils.py", line 207, in
if torch.version < version.Version("1.12"):
TypeError: '<' not supported between instances of 'TorchVersion' and 'Version'

Traceback (most recent call last):
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main

return _run_code(code, main_globals, None,

File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in
main()
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command

simple_launcher(args)

File "C:\Users\TU\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\TU\AppData\Local\Programs\Python\Python310\python.exe', 'D:/AI/ϰComfyUI_windows_2024p2/ComfyUI_windows_2024/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:\AI\ϰComfyUI_windows_2024p2\ComfyUI_windows_2024\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.safetensors', '--train_data_dir=D/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=DZZ', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=DZZ', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=7', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1.

Train finished

Prompt executed in 9.12 seconds

I think I'm having the same problem as you:

UNet2DConditionModel: 64, 8, 768, False, False
Traceback (most recent call last):
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 1012, in <module>
    trainer.train(args)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 228, in train
    model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 102, in load_target_model
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\train_util.py", line 3917, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\train_util.py", line 3860, in _load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\model_util.py", line 1007, in load_models_from_stable_diffusion_checkpoint
    info = unet.load_state_dict(converted_unet_checkpoint)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
        Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.0.proj_out.weight", "down_blocks.0.attentions.0.proj_out.bias", "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias", "down_blocks.0.attentions.1.proj_in.weight", "down_blocks.0.attentions.1.proj_in.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.1.proj_out.weight", "down_blocks.0.attentions.1.proj_out.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "up_blocks.2.attentions.0.norm.weight", "up_blocks.2.attentions.0.norm.bias", "up_blocks.2.attentions.0.proj_in.weight", "up_blocks.2.attentions.0.proj_in.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.0.proj_out.weight", "up_blocks.2.attentions.0.proj_out.bias", "up_blocks.2.attentions.1.norm.weight", "up_blocks.2.attentions.1.norm.bias", "up_blocks.2.attentions.1.proj_in.weight", "up_blocks.2.attentions.1.proj_in.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.1.proj_out.weight", "up_blocks.2.attentions.1.proj_out.bias", "up_blocks.2.attentions.2.norm.weight", "up_blocks.2.attentions.2.norm.bias", "up_blocks.2.attentions.2.proj_in.weight", "up_blocks.2.attentions.2.proj_in.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.2.proj_out.weight", "up_blocks.2.attentions.2.proj_out.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.attentions.0.norm.weight", "up_blocks.3.attentions.0.norm.bias", "up_blocks.3.attentions.0.proj_in.weight", "up_blocks.3.attentions.0.proj_in.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.0.proj_out.weight", "up_blocks.3.attentions.0.proj_out.bias", "up_blocks.3.attentions.1.norm.weight", "up_blocks.3.attentions.1.norm.bias", "up_blocks.3.attentions.1.proj_in.weight", "up_blocks.3.attentions.1.proj_in.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.1.proj_out.weight", "up_blocks.3.attentions.1.proj_out.bias", "up_blocks.3.attentions.2.norm.weight", "up_blocks.3.attentions.2.norm.bias", "up_blocks.3.attentions.2.proj_in.weight", "up_blocks.3.attentions.2.proj_in.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.2.proj_out.weight", "up_blocks.3.attentions.2.proj_out.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias".
        Unexpected key(s) in state_dict: "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.0.norm.bias", "up_blocks.0.attentions.0.norm.weight", "up_blocks.0.attentions.0.proj_in.bias", "up_blocks.0.attentions.0.proj_in.weight", "up_blocks.0.attentions.0.proj_out.bias", "up_blocks.0.attentions.0.proj_out.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.1.norm.bias", "up_blocks.0.attentions.1.norm.weight", "up_blocks.0.attentions.1.proj_in.bias", "up_blocks.0.attentions.1.proj_in.weight", "up_blocks.0.attentions.1.proj_out.bias", "up_blocks.0.attentions.1.proj_out.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.2.norm.bias", "up_blocks.0.attentions.2.norm.weight", "up_blocks.0.attentions.2.proj_in.bias", "up_blocks.0.attentions.2.proj_in.weight", "up_blocks.0.attentions.2.proj_out.bias", "up_blocks.0.attentions.2.proj_out.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.1.norm1.bias", "mid_block.attentions.0.transformer_blocks.1.norm1.weight", "mid_block.attentions.0.transformer_blocks.1.norm2.bias", "mid_block.attentions.0.transformer_blocks.1.norm2.weight", "mid_block.attentions.0.transformer_blocks.1.norm3.bias", "mid_block.attentions.0.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.2.norm1.bias", "mid_block.attentions.0.transformer_blocks.2.norm1.weight", "mid_block.attentions.0.transformer_blocks.2.norm2.bias", "mid_block.attentions.0.transformer_blocks.2.norm2.weight", "mid_block.attentions.0.transformer_blocks.2.norm3.bias", "mid_block.attentions.0.transformer_blocks.2.norm3.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.3.norm1.bias", "mid_block.attentions.0.transformer_blocks.3.norm1.weight", "mid_block.attentions.0.transformer_blocks.3.norm2.bias", "mid_block.attentions.0.transformer_blocks.3.norm2.weight", "mid_block.attentions.0.transformer_blocks.3.norm3.bias", "mid_block.attentions.0.transformer_blocks.3.norm3.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.4.norm1.bias", "mid_block.attentions.0.transformer_blocks.4.norm1.weight", "mid_block.attentions.0.transformer_blocks.4.norm2.bias", "mid_block.attentions.0.transformer_blocks.4.norm2.weight", "mid_block.attentions.0.transformer_blocks.4.norm3.bias", "mid_block.attentions.0.transformer_blocks.4.norm3.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.5.norm1.bias", "mid_block.attentions.0.transformer_blocks.5.norm1.weight", "mid_block.attentions.0.transformer_blocks.5.norm2.bias", "mid_block.attentions.0.transformer_blocks.5.norm2.weight", "mid_block.attentions.0.transformer_blocks.5.norm3.bias", "mid_block.attentions.0.transformer_blocks.5.norm3.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.6.norm1.bias", "mid_block.attentions.0.transformer_blocks.6.norm1.weight", "mid_block.attentions.0.transformer_blocks.6.norm2.bias", "mid_block.attentions.0.transformer_blocks.6.norm2.weight", "mid_block.attentions.0.transformer_blocks.6.norm3.bias", "mid_block.attentions.0.transformer_blocks.6.norm3.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.7.norm1.bias", "mid_block.attentions.0.transformer_blocks.7.norm1.weight", "mid_block.attentions.0.transformer_blocks.7.norm2.bias", "mid_block.attentions.0.transformer_blocks.7.norm2.weight", "mid_block.attentions.0.transformer_blocks.7.norm3.bias", "mid_block.attentions.0.transformer_blocks.7.norm3.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.8.norm1.bias", "mid_block.attentions.0.transformer_blocks.8.norm1.weight", "mid_block.attentions.0.transformer_blocks.8.norm2.bias", "mid_block.attentions.0.transformer_blocks.8.norm2.weight", "mid_block.attentions.0.transformer_blocks.8.norm3.bias", "mid_block.attentions.0.transformer_blocks.8.norm3.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.norm1.bias", "mid_block.attentions.0.transformer_blocks.9.norm1.weight", "mid_block.attentions.0.transformer_blocks.9.norm2.bias", "mid_block.attentions.0.transformer_blocks.9.norm2.weight", "mid_block.attentions.0.transformer_blocks.9.norm3.bias", "mid_block.attentions.0.transformer_blocks.9.norm3.weight".
        size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.0.resnets.2.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.0.resnets.2.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.0.resnets.2.conv1.weight: copying a param with shape torch.Size([1280, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.0.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([1280, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.1.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.1.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.2.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.2.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.0.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.0.conv1.weight: copying a param with shape torch.Size([640, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.1.resnets.0.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.0.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([640, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.1.norm1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.1.conv1.weight: copying a param with shape torch.Size([640, 1280, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.1.resnets.1.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.1.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([640, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.1.resnets.2.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.1.resnets.2.conv1.weight: copying a param with shape torch.Size([640, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]).
        size mismatch for up_blocks.1.resnets.2.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.2.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([640, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]).
        size mismatch for up_blocks.1.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.upsamplers.0.conv.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.upsamplers.0.conv.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.0.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.2.resnets.0.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.2.resnets.0.conv1.weight: copying a param with shape torch.Size([320, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]).
        size mismatch for up_blocks.2.resnets.0.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.0.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([320, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]).
        size mismatch for up_blocks.2.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.1.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.1.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]).
        size mismatch for up_blocks.2.resnets.1.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.1.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]).
        size mismatch for up_blocks.2.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).
        size mismatch for up_blocks.2.resnets.2.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).
        size mismatch for up_blocks.2.resnets.2.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]).
        size mismatch for up_blocks.2.resnets.2.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.2.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]).
        size mismatch for up_blocks.2.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
Traceback (most recent call last):
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in <module>
    main()
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
    launch_command(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\Gustav\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', 'J:/Stable/ComfyUI_windows_portable/ComfyUI/custom_nodes/Lora-Training-in-Comfy-main/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=J:\\Stable\\ComfyUI_windows_portable\\ComfyUI\\models\\checkpoints\\SDXL\\realismEngineSDXL_v30VAE.safetensors', '--train_data_dir=C:/Users/Gustav/Pictures/results/BO/rafael/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=rafy_face', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=rafy_face', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=6', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1.
Train finished
Prompt executed in 14.07 seconds```

I think I'm having the same problem as you:

UNet2DConditionModel: 64, 8, 768, False, False
Traceback (most recent call last):
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 1012, in <module>
    trainer.train(args)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 228, in train
    model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\train_network.py", line 102, in load_target_model
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\train_util.py", line 3917, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\train_util.py", line 3860, in _load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
  File "J:\Stable\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main\sd-scripts\library\model_util.py", line 1007, in load_models_from_stable_diffusion_checkpoint
    info = unet.load_state_dict(converted_unet_checkpoint)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
        Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.0.proj_out.weight", "down_blocks.0.attentions.0.proj_out.bias", "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias", "down_blocks.0.attentions.1.proj_in.weight", "down_blocks.0.attentions.1.proj_in.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.1.proj_out.weight", "down_blocks.0.attentions.1.proj_out.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "up_blocks.2.attentions.0.norm.weight", "up_blocks.2.attentions.0.norm.bias", "up_blocks.2.attentions.0.proj_in.weight", "up_blocks.2.attentions.0.proj_in.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.0.proj_out.weight", "up_blocks.2.attentions.0.proj_out.bias", "up_blocks.2.attentions.1.norm.weight", "up_blocks.2.attentions.1.norm.bias", "up_blocks.2.attentions.1.proj_in.weight", "up_blocks.2.attentions.1.proj_in.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.1.proj_out.weight", "up_blocks.2.attentions.1.proj_out.bias", "up_blocks.2.attentions.2.norm.weight", "up_blocks.2.attentions.2.norm.bias", "up_blocks.2.attentions.2.proj_in.weight", "up_blocks.2.attentions.2.proj_in.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.2.proj_out.weight", "up_blocks.2.attentions.2.proj_out.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.attentions.0.norm.weight", "up_blocks.3.attentions.0.norm.bias", "up_blocks.3.attentions.0.proj_in.weight", "up_blocks.3.attentions.0.proj_in.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.0.proj_out.weight", "up_blocks.3.attentions.0.proj_out.bias", "up_blocks.3.attentions.1.norm.weight", "up_blocks.3.attentions.1.norm.bias", "up_blocks.3.attentions.1.proj_in.weight", "up_blocks.3.attentions.1.proj_in.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.1.proj_out.weight", "up_blocks.3.attentions.1.proj_out.bias", "up_blocks.3.attentions.2.norm.weight", "up_blocks.3.attentions.2.norm.bias", "up_blocks.3.attentions.2.proj_in.weight", "up_blocks.3.attentions.2.proj_in.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.2.proj_out.weight", "up_blocks.3.attentions.2.proj_out.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias".
        Unexpected key(s) in state_dict: "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.0.norm.bias", "up_blocks.0.attentions.0.norm.weight", "up_blocks.0.attentions.0.proj_in.bias", "up_blocks.0.attentions.0.proj_in.weight", "up_blocks.0.attentions.0.proj_out.bias", "up_blocks.0.attentions.0.proj_out.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.1.norm.bias", "up_blocks.0.attentions.1.norm.weight", "up_blocks.0.attentions.1.proj_in.bias", "up_blocks.0.attentions.1.proj_in.weight", "up_blocks.0.attentions.1.proj_out.bias", "up_blocks.0.attentions.1.proj_out.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.2.norm.bias", "up_blocks.0.attentions.2.norm.weight", "up_blocks.0.attentions.2.proj_in.bias", "up_blocks.0.attentions.2.proj_in.weight", "up_blocks.0.attentions.2.proj_out.bias", "up_blocks.0.attentions.2.proj_out.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.1.norm1.bias", "mid_block.attentions.0.transformer_blocks.1.norm1.weight", "mid_block.attentions.0.transformer_blocks.1.norm2.bias", "mid_block.attentions.0.transformer_blocks.1.norm2.weight", "mid_block.attentions.0.transformer_blocks.1.norm3.bias", "mid_block.attentions.0.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.2.norm1.bias", "mid_block.attentions.0.transformer_blocks.2.norm1.weight", "mid_block.attentions.0.transformer_blocks.2.norm2.bias", "mid_block.attentions.0.transformer_blocks.2.norm2.weight", "mid_block.attentions.0.transformer_blocks.2.norm3.bias", "mid_block.attentions.0.transformer_blocks.2.norm3.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.3.norm1.bias", "mid_block.attentions.0.transformer_blocks.3.norm1.weight", "mid_block.attentions.0.transformer_blocks.3.norm2.bias", "mid_block.attentions.0.transformer_blocks.3.norm2.weight", "mid_block.attentions.0.transformer_blocks.3.norm3.bias", "mid_block.attentions.0.transformer_blocks.3.norm3.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.4.norm1.bias", "mid_block.attentions.0.transformer_blocks.4.norm1.weight", "mid_block.attentions.0.transformer_blocks.4.norm2.bias", "mid_block.attentions.0.transformer_blocks.4.norm2.weight", "mid_block.attentions.0.transformer_blocks.4.norm3.bias", "mid_block.attentions.0.transformer_blocks.4.norm3.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.5.norm1.bias", "mid_block.attentions.0.transformer_blocks.5.norm1.weight", "mid_block.attentions.0.transformer_blocks.5.norm2.bias", "mid_block.attentions.0.transformer_blocks.5.norm2.weight", "mid_block.attentions.0.transformer_blocks.5.norm3.bias", "mid_block.attentions.0.transformer_blocks.5.norm3.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.6.norm1.bias", "mid_block.attentions.0.transformer_blocks.6.norm1.weight", "mid_block.attentions.0.transformer_blocks.6.norm2.bias", "mid_block.attentions.0.transformer_blocks.6.norm2.weight", "mid_block.attentions.0.transformer_blocks.6.norm3.bias", "mid_block.attentions.0.transformer_blocks.6.norm3.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.7.norm1.bias", "mid_block.attentions.0.transformer_blocks.7.norm1.weight", "mid_block.attentions.0.transformer_blocks.7.norm2.bias", "mid_block.attentions.0.transformer_blocks.7.norm2.weight", "mid_block.attentions.0.transformer_blocks.7.norm3.bias", "mid_block.attentions.0.transformer_blocks.7.norm3.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.8.norm1.bias", "mid_block.attentions.0.transformer_blocks.8.norm1.weight", "mid_block.attentions.0.transformer_blocks.8.norm2.bias", "mid_block.attentions.0.transformer_blocks.8.norm2.weight", "mid_block.attentions.0.transformer_blocks.8.norm3.bias", "mid_block.attentions.0.transformer_blocks.8.norm3.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.norm1.bias", "mid_block.attentions.0.transformer_blocks.9.norm1.weight", "mid_block.attentions.0.transformer_blocks.9.norm2.bias", "mid_block.attentions.0.transformer_blocks.9.norm2.weight", "mid_block.attentions.0.transformer_blocks.9.norm3.bias", "mid_block.attentions.0.transformer_blocks.9.norm3.weight".
        size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.0.resnets.2.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.0.resnets.2.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.0.resnets.2.conv1.weight: copying a param with shape torch.Size([1280, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.0.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([1280, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.0.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.1.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.1.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.2.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for up_blocks.1.attentions.2.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.0.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.0.conv1.weight: copying a param with shape torch.Size([640, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.1.resnets.0.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.0.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([640, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.1.norm1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).
        size mismatch for up_blocks.1.resnets.1.conv1.weight: copying a param with shape torch.Size([640, 1280, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).
        size mismatch for up_blocks.1.resnets.1.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.1.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([640, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).
        size mismatch for up_blocks.1.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.1.resnets.2.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.1.resnets.2.conv1.weight: copying a param with shape torch.Size([640, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]).
        size mismatch for up_blocks.1.resnets.2.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
        size mismatch for up_blocks.1.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.resnets.2.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([640, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]).
        size mismatch for up_blocks.1.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.1.upsamplers.0.conv.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).
        size mismatch for up_blocks.1.upsamplers.0.conv.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.0.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.2.resnets.0.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).
        size mismatch for up_blocks.2.resnets.0.conv1.weight: copying a param with shape torch.Size([320, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]).
        size mismatch for up_blocks.2.resnets.0.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.0.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([320, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]).
        size mismatch for up_blocks.2.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.1.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).
        size mismatch for up_blocks.2.resnets.1.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]).
        size mismatch for up_blocks.2.resnets.1.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.1.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]).
        size mismatch for up_blocks.2.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).
        size mismatch for up_blocks.2.resnets.2.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).
        size mismatch for up_blocks.2.resnets.2.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]).
        size mismatch for up_blocks.2.resnets.2.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).
        size mismatch for up_blocks.2.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).
        size mismatch for up_blocks.2.resnets.2.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for up_blocks.2.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]).
        size mismatch for up_blocks.2.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
Traceback (most recent call last):
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in <module>
    main()
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
    launch_command(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\Gustav\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', 'J:/Stable/ComfyUI_windows_portable/ComfyUI/custom_nodes/Lora-Training-in-Comfy-main/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=J:\\Stable\\ComfyUI_windows_portable\\ComfyUI\\models\\checkpoints\\SDXL\\realismEngineSDXL_v30VAE.safetensors', '--train_data_dir=C:/Users/Gustav/Pictures/results/BO/rafael/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=rafy_face', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=rafy_face', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=6', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1.
Train finished
Prompt executed in 14.07 seconds```

clone sd-scripts from https://github.com/kohya-ss/sd-scripts/tree/main. "cd sd-scripts/library && {venv-path}/Scripts/pip.exe install -e ."(windows)