bbc-mc/sdweb-clip-changer

Device confusion

Closed this issue · 1 comments

Baughn commented

I ran into some trouble using this with Counterfeit V3.0, which requires openai/clip-vit-large-patch14-336.

  CLIPTextModel was on cpu
  CLIPTextModel applied: openai/clip-vit-large-patch14-336
  CLIPTokenizer applied: openai/clip-vit-large-patch14-336
Applying scaled dot product cross attention optimization.
Model loaded in 3.8s (load weights from disk: 0.1s, create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.2s, scripts callbacks: 2.3s).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
CLIP Changer.on_app_started done.
Startup time: 8.7s (import torch: 0.8s, import gradio: 0.6s, import ldm: 0.4s, other imports: 0.4s, load scripts: 0.5s, load SD checkpoint: 3.9s, scripts before_ui_callback: 0.1s, create ui: 1.9s).
Error completing request
Arguments: ('task(ll4q9oh1fby7wn2)', '(masterwork, best quality, ultrarealistic:0.5). Energetic girl on the train', 'EasyNegativeV2', [], 20, 15, False, False, 1, 1, 7, 3870221604.0, -1.0, 0, 0, 0, False, 512, 768, False, 0.55, 2, 'Lanczos', 20, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/home/svein/AI/sd-bot/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/home/svein/AI/sd-bot/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/home/svein/AI/sd-bot/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/home/svein/AI/sd-bot/modules/processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "/home/svein/AI/sd-bot/modules/processing.py", line 642, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/home/svein/AI/sd-bot/modules/processing.py", line 587, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/svein/AI/sd-bot/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/svein/AI/sd-bot/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/modules/sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/svein/AI/sd-bot/modules/sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/svein/AI/sd-bot/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 816, in forward
    return self.text_model(
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 712, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 227, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/modules/sd_hijack.py", line 234, in forward
    inputs_embeds = self.wrapped(input_ids)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/home/svein/AI/sd-bot/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

...as you can see, the text model needs to be moved to the GPU.

However, like you can probably guess, I modified your code to print sd_model.cond_stage_model.transformer.device prior to swapping the text model. Which printed "cpu". Nevertheless, swapping to .to('cuda') fixed the problem. I'm really not sure how that can happen -- maybe there's some code elsewhere in A1111 to move it, and swapping the model breaks that somehow?

bbc-mc commented

Hi, thanks for report.

I add support for lowvram/midvram options, which uses additional function for model device position.
I think this function causes such all tensors to be on the same device problem.

Please try.