city96/ComfyUI-GGUF

flux Q*_K models don't work on macbooks (green noise)

vmirnv opened this issue · 8 comments

green_noise

First of all — thank you for the node. it is really helpful for small and personal projects.

I've tried multiple different flux Q*_K models from different authors — none of them work.
Meanwhile Q4_1 and similar quantisations work totally fine.

There are some reddit posts about that issue, including mine with no solution yet:
https://www.reddit.com/r/StableDiffusion/comments/1heupt9/flux_q_k_models_dont_work_on_macbooks/
https://www.reddit.com/r/comfyui/comments/1glk3lc/help_flux_schnell_gguf_is_generating_green_images/
https://www.reddit.com/r/FluxAI/comments/1gc98wh/flux_gguf_with_comfyui_on_macbook_m3_not/

I don't have any apple devices to test but it seems like a strange failure considering both Q4_1 and Q2_K_S use pretty similar functions (bitwise and and rshift).

Reading that thread you linked and considering the T5 encoder does work fine makes it even stranger (I assume Q3_K_S fails with flux?).

Could you maybe test SD3.5 medium to see if the issue is isolated to flux or if it happens across all models. Also may be worth testing pytorch stable VS nightly to see if there's any fixes upstream.

test
both sd3.5_medium-Q3_K_S.gguf and sd3.5_medium-Q4_0.gguf work fine

Right now I'm using comfyui desktop — so it would be a bit harder to control pytorch version so I will test it later.
You could see from reddit posts that it's a common issue for at least a few months and for different macs (M1/M3).

Interesting how the SD3.5 preview still has green spots in it. Wonder if it's some precision issues, but iirc someone did try the --force-fp32 launch flag without much luck before.

Another thing that might be worth testing is using the Unet Loader (GGUF/Advanced) node and setting dequant_dtype to float32.

is using the Unet Loader (GGUF/Advanced) node and setting dequant_dtype to float32.

Unfortunately, no luck — same issue :(

I've tried your hunyuan-video-t2v-720p-Q3_K_S.gguf with this result:
Screenshot 2024-12-19 at 13 50 16
Again, hunyuan-video-t2v-720p-Q4_1.gguf works fine.

My hypothesis is this — maybe there was some update in the GGUF conversion tool that broke some Q*_K models for macs.
We need someone with knowledge and Apple devices to test this.
I can use Q4_1, but I’ve noticed some deep confusion among new users.

Thank you for your work, and I’m looking forward to FastVideo/FastHunyuan GGUF :)
Yesterday, I tried making my own but, unfortunately, failed.

is using the Unet Loader (GGUF/Advanced) node and setting dequant_dtype to float32.

Unfortunately, no luck — same issue :(

I've tried your hunyuan-video-t2v-720p-Q3_K_S.gguf with this result: Screenshot 2024-12-19 at 13 50 16 Again, hunyuan-video-t2v-720p-Q4_1.gguf works fine.

My hypothesis is this — maybe there was some update in the GGUF conversion tool that broke some Q*_K models for macs. We need someone with knowledge and Apple devices to test this. I can use Q4_1, but I’ve noticed some deep confusion among new users.

Thank you for your work, and I’m looking forward to FastVideo/FastHunyuan GGUF :) Yesterday, I tried making my own but, unfortunately, failed.

@vmirnv are you able to share the workflow that you used which successfully generated a video using the hunyuan-video-t2v-720p-Q4_1.gguf model? Also, if possible, can you share the version of torch you had installed when running the workflow? I have tried and tried to no avail to get my workflow (using the same gguf model) to produce a video that doesn't look all pixelated (like in the example image you showed).

is using the Unet Loader (GGUF/Advanced) node and setting dequant_dtype to float32.

Unfortunately, no luck — same issue :(
I've tried your hunyuan-video-t2v-720p-Q3_K_S.gguf with this result: Screenshot 2024-12-19 at 13 50 16 Again, hunyuan-video-t2v-720p-Q4_1.gguf works fine.
My hypothesis is this — maybe there was some update in the GGUF conversion tool that broke some Q*_K models for macs. We need someone with knowledge and Apple devices to test this. I can use Q4_1, but I’ve noticed some deep confusion among new users.
Thank you for your work, and I’m looking forward to FastVideo/FastHunyuan GGUF :) Yesterday, I tried making my own but, unfortunately, failed.

@vmirnv are you able to share the workflow that you used which successfully generated a video using the hunyuan-video-t2v-720p-Q4_1.gguf model? Also, if possible, can you share the version of torch you had installed when running the workflow? I have tried and tried to no avail to get my workflow (using the same gguf model) to produce a video that doesn't look all pixelated (like in the example image you showed).

Image

@austinbrown34 you could check my simple workflow here: https://civitai.com/models/1048570/simple-gguf-hunyuan-text2video-workflow

Reading that thread you linked and considering the T5 encoder does work fine makes it even stranger

@city96 I think I found the reason why GGUFs with Q*_K work fine with clip encoders — ComfyUI simply runs them on the Mac CPU, while KSampler runs on the GPU.

UPDATE: I finally determined what the issue was. I was running my comfyui server with —force-fp16, which inevitably produces the pixelated output I was seeing. Once I either removed this or switched it for —force-fp32, the videos generate correctly.