StableDiffusionPipelineExplainer enable_attention_slicing() and limit token attribution

Question

StableDiffusionPipelineExplainer enable_attention_slicing() and limit token attribution

TomPham97 opened this issue 2 years ago · 4 comments

Version 0.3.0 of the 🤗 Difusers introduces enable_attention_slicing, and I wonder if there's a way to implement this in the explainer. Below is the code that I used and it ran out of CUDA memory:

# Import pipeline
import torch
from diffusers import StableDiffusionPipeline

torch_device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    use_auth_token = True,
    revision = "fp16" if torch_device != "cpu" else None,
    torch_dtype = torch.float16 if torch_device != "cpu" else None)

pipe.to(torch_device)

pipe.enable_attention_slicing() # attention optimization for less memory usage

# Pass pipeline to the explainer class

from diffusers_interpret import StableDiffusionPipelineExplainer

explainer = StableDiffusionPipelineExplainer(pipe)

prompt = "photograph, piggy, corn salad"

with torch.autocast(torch_device):
    output = explainer(prompt,
                       guidance_scale=7.5,
                       num_inference_steps=17)

output.image

Answer 1 · 2022-09-10T16:40:22.000Z

Attention slicing is fully compatible with the explainer :)
However, it is expected that you will run out of GPU memory because the attributions calculation is pretty expensive.
I recommend you add the arguments I used in this notebook.

Enabling gradient_checkpointing helps.

explainer = StableDiffusionPipelineExplainer(
    pipe,
    
    # We pass `True` in here to be able to have a higher `n_last_diffusion_steps_to_consider_for_attributions` in the cell below
    gradient_checkpointing=True 
)

Also, reducing n_last_diffusion_steps_to_consider_for_attributions, height and width decreases GPU usage.

prompt = "A cute corgi with the Eiffel Tower in the background"

generator = torch.Generator(device).manual_seed(2023)
with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt, 
        num_inference_steps=50, 
        generator=generator,
        height=448,
        width=448,
        
        # for this model, the GPU VRAM usage will raise drastically if we increase this argument. feel free to experiment with it
        # if you are not interested in checking the token attributions, you can pass 0 in here
        n_last_diffusion_steps_to_consider_for_attributions=5
    )

Answer 2 · 2022-09-10T21:07:34.000Z

Thank you for your reply ☺️

Once I've passed the argument gradient_checkpointing = True, all 51 inference steps went through without memory constraint. Not surprisingly, CUDA ran out of memory soon after the Calculating token attributions... prompt apppeared.

I've then passed the argument n_last_diffusion_steps_to_consider_for_attributions = 0, but this appeared from explainer.py line 194:

NotImplementedError: Only `attribution_method='grad_x_input'` is implemented for now

On the other hand, passing the argument to 1 or None yielded out of CUDA memory after all the inference steps.

Answer 3 · 2022-09-10T23:43:05.000Z

I've then passed the argument n_last_diffusion_steps_to_consider_for_attributions = 0, but this appeared from explainer.py line 194:

NotImplementedError: Only attribution_method='grad_x_input' is implemented for now

Yes, that's my bad... I added a new patch version to fix that.

On the other hand, passing the argument to 1 or None yielded out of CUDA memory after all the inference steps.

To clarify, n_last_diffusion_steps_to_consider_for_attributions=None will calculate attributions for the entire diffusion process while n_last_diffusion_steps_to_consider_for_attributions=0 will not compute attributions at all. I need to write documentation on this!

If with n_last_diffusion_steps_to_consider_for_attributions=1 you are still running out of memory (and using gradient checkpointing and attention slicing), that means that your GPU is just not big enough to compute the gradients for an image of (512, 512) 😞

I suggest using cpu instead or reducing the image size. On Colab it works with a (448, 448).
This stable diffusion model is pretty big, I'm still looking for more ways to require less GPU VRAM.

Answer 4 · 2022-09-11T17:33:27.000Z

Yes, that's my bad... I added a new patch version to fix that.

Thank you very much, and this feature is working as intended!