GGUF loader fails with quantized version of SD1.5 model
Closed this issue · 3 comments
Hi,
I made a GGUF version of a SD1.5 checkpoint I like (extracting unet then converting it to FP16 GGUF).
The result is very good, the GGUF loader loads the FP16 GGUF checkpoint, and I can't spot any difference in the images produced, compared to the original SD 1.5 safetensor checkpoint.
Then I used your method to make a quantized version of this FP16 GGUF : Q4_K_S.gguf
The GGUF loader loads this quantized GGUF, but the result is very bad : images are just black ...
Do you know a solution ? I experiment so I'd like to understand.
Could you check if setting this part to true makes it work? I might've accidentally set it to false during one of the rewrites for 1.5
Line 50 in 8e898fa
Yes, it works like a charm, thank you very much.
I changed the value to True and restarted the process from the beginning.
The GGUF version of the SD 1.5 checkpoint is less than half the size of the safetensor file. There are a few differences in the images produced, but the quality is equivalent.
It was just curiosity, because the quality of SD 1.5 is much inferior to the quality of current checkpoints.
I'd also be curious if it's possible to make GGUF versions of LORA's. The question was asked last year but there is very little information ggerganov/llama.cpp#3489
Cool, looks like I missed that value when I originally rewrote that convert script. Changed it to true in the repo now so closing this.
As for LoRAs. Technically possible but doesn't seem worth it since they're fairly small + with default checkpoints they're baked into the main checkpoint (so it would only matter for storage, not vram usage) while with gguf checkpoints they'd have to be dequantized then applied on the fly (slowing them down further)