Possibly GPU memory leak?
kshieh1 opened this issue ยท 11 comments
Hi,
Found a GPU out-of-memory(OOM) error when using comple in my project. I made a shorter test program out of your compel-demp.py
:
import torch
from compel import Compel
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from torch import Generator
device = "cuda"
pipeline = StableDiffusionPipeline.from_pretrained("dreamlike-art/dreamlike-photoreal-2.0",
torch_dtype=torch.float16).to(device)
# dpm++
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config,
algorithm_type="dpmsolver++")
COMPEL = True
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)
i = 0
while True:
prompts = ["a cat playing with a ball++ in the forest", "a cat playing with a ball in the forest"]
if COMPEL:
prompt_embeds = torch.cat([compel.build_conditioning_tensor(prompt) for prompt in prompts])
images = pipeline(prompt_embeds=prompt_embeds, num_inference_steps=10, width=256, height=256).images
#del prompt_embeds # not helping
else:
images = pipeline(prompt=prompts, num_inference_steps=10, width=256, height=256).images
i += 1
print(i, images)
images[0].save('img0.jpg')
images[1].save('img1.jpg')
Tested on Nvidia RTX-3050Ti Mobile GPU w/ 4G VRAM, an OOM exception will occur after 10~20 iterations. No OOM if use COMPEL = False
.
hmm, compel is basically stateless, there isn't much that could leak that i have much control over. torch is sometimes poor at cleaning up its caches properly, you might want to try calling torch.cuda.empty_cache()
occasionally
Thanks. I think I have pushed VRAM usage on edge -- maybe torch need some extra room to maneuver...
(Updated Apr. 17) OOM occurs even if just prompt embeddings were built repeatedly w/o running inference (i.e., images = pipeline(...)
has been commented out). torch.cuda.empty_cache()
does not help.
urgh. idk. i also don't have a local gpu to readily debug this. have you tried tearing down the compel instance and making a new one for each prompt?
Interesting. I run the same test on Google Colab (GPU w/ 12G VRAM) and no OOM issue occured. Then I updated my local envrionment with exact same package versions (e.g., torch, diffusers, compel, ... etc) like the Colab however OOM issue still occurs. Local test was on Nvidia GPU with 4G and 8G, btw.
init & delete compel instance inside the loop doesn't help, fyi
@kshieh1 Did you ever figure out a solution to this? I'm also hitting my 6GB limit as soon as I use the compel embeddings
@kshieh1 Did you ever figure out a solution to this? I'm also hitting my 6GB limit as soon as I use the compel embeddings
No luck so far
I think I have come out a solution. After image generation, you should explictly de-reference the tensor object (i.e., prompt_embeds = None
) and call gc.collect()
ahh nice. i'll add a note on the readme for the next version. thanks for sharing your solution!
The readme has been updated.
@kshieh1 we encountered a possibly related (possibly the same?) problem in InvokeAI, which was resolved by doing calls to Compel inside a with torch.no_grad():
block. did you try this?
Yeah, I just did a quick test and found the amount of cuda memory allocation is stable -- I think I can get rid of those costly gc.collect() operations from my code.
Thanks for sharing.