No performance difference between original BF16 and Q4_0 quantized GGUF models
Opened this issue · 3 comments
The original 23.8 GB flux1-dev model runs at around the same speed as the 6.8 GB Q4_0 quant that should fit completely into my 12 GB of vram.
This is my workflow:
workflow.json
My GPU is a rx 6700xt and I'm using rocm on Ubuntu, so far without any problems.
I hope someone can help me or at least explain why this is happening.
"should fit completely into my 12 GB of vram."
Well is it fitting into your Vram? have you checked with task manager while generating?
A Q4_0 GGUF model is still over 11GB and you will need to force the T5 Text encoder to run on the CPU to save Vram and also not have lots of internet tabs etc open to be able to fit it all into Vram.
Q4_0 has 6.8 GB as I stated in my post. Text encoder is on cpu using force clip device.