AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients

[Bug] RuntimeError: The size of tensor a (768) must match the size of tensor b (512) at non-singleton dimension 2

jtara1 opened this issue · 1 comments

I was able to train to get my aesthetic gradient embedding, but this bug happens when I try txt2img using the embedding I just created. The training image was 512 x 512. My txt2img target resolution is 512 x 512.

End of stacktrace:

  File "C:\Users\James\projects\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "C:\Users\James\projects\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\James\projects\stable-diffusion-webui\modules\sd_hijack_clip.py", line 220, in forward
    z = self.process_tokens(tokens, multipliers)
  File "C:\Users\James\projects\stable-diffusion-webui\extensions\stable-diffusion-webui-aesthetic-gradients\aesthetic_clip.py", line 255, in __call__
    z = z * (1 - self.aesthetic_weight) + zn * self.aesthetic_weight
RuntimeError: The size of tensor a (768) must match the size of tensor b (512) at non-singleton dimension 2

sd web ui version: commit 983167e621aa55431f6dc7e0a26f021a66a33cd0

aesthetic-gradients version: 2624e5d (HEAD -> master, origin/master, origin/HEAD) use the new callback for script unloaded to stop the script from having effect after it's unloaded

I've also applied the manual patch to my extensions/stable-diffusion-webui-aesthetic-gradients/aesthetic_clip.py changing line 97 to aesthetic_clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") as suggested for the other bug fix in #21

  • My "z" model shape was (1, 77, 768) and this was mismatching my CLIP model shape, zn, (1, 77, 512). I used a CLIP model with projection_dim of 768 to fix. Changed code in aethestic_clip.py to pull and use "openai/clip-vit-large-patch14".
  • It is a requirement that your training image(s) have the resolution to match this dimension it seems. For me, had to use 768 x 768. This is false, I encountered another error about failing to multiple 2 matrices with @ operator. It may have been bc I was using an older aesthetic trained with different CLIP model and shape.