OOM error with Hunyuan Video GGUF

Question

OOM error with Hunyuan Video GGUF

JorgeR81 opened this issue 3 months ago · 5 comments

I have an OOM error when trying to run Hunyuan Video GGUF Q6_K

torch.cuda.OutOfMemoryError: Allocation on device

Got an OOM, unloading all loaded models.

I tried the ComfyUI example workflow, with the llava FP8 encoder.
I just replaced the native loader with the GGUF loader.

I haven't tried the regular Hunyuan Video model.

I have an old system, but I can run LTX-Video and Flux, even the full size models.

Answer 1 · 2024-12-22T15:40:05.000Z

What resolution/frame count? Might just be flat out OOMing on runtime costs alone considering it says it has 7GBs free (so barely any part of the model is loaded, most of it probably in lowvram mode.)

Could try play with the reserved VRAM amount (launch flag) or close VRAM intensive stuff/switch the desktop to the iGPU, though idk what res/frames you can get on a 8GB card, it's borderline unusable for me on a 10GB one lol

Answer 2 · 2024-12-22T16:17:16.000Z

I tried the default settings in the example workflow.
Reserve VRAM didn't help.

I'm downloading the Q4K_M versions Hunyuan Video and llava encoder, to see if it works.

LTX-Video worked so well, that I thought I could give this a shot :)

Answer 3 · 2024-12-22T17:10:56.000Z

Pushed a commit that might fix it. Keyword might lol.

Answer 4 · 2024-12-22T18:35:05.000Z

Did some more testing:

Hunyuan Q4_K_M
llava Q4_K_M

It OOM's right away, loading the llava model, in the CLIPTexEncode stage, even before loading the Hunyuan model.
This happened before and after the latest updates.

I also tried:

hunyuan Q4_K_M
llava FP8

It loads all the models, and it starts the generation process.
But it stall's in the first step, for a while, until it goes OOM's
Reserve VRAM also didn't help.

So maybe it's just too much for my old GPU.

But there may also be a separate issue with the GGUF llava encoder.

Answer 5 · 2025-01-17T14:35:53.000Z

I tried the default settings in the example workflow.

I am using gguf Q8 model, 640x480 resolution, frame count 65, tile_size 128, overlap 32, fps 16.
Add fastvideo lora and sage attention.
I also have only 8gb vram, so I turned on the system memory fallback.