FP8 support in stable fast

Question

FP8 support in stable fast

jkrauss82 opened this issue 9 months ago · 6 comments

Is it planned?

Currently getting this error when trying to run ComfyUI in fp8 (flags --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet):

RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'

Answer 1 · 2024-02-25T16:07:06.000Z

Is it planned?

Currently getting this error when trying to run ComfyUI in fp8 (flags --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet):
RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'

I'm quite sure stable fast has its own quantization node but it's not implemented in the node iirc

Answer 2 · 2024-02-26T09:22:53.000Z

@jkrauss82 Sorry, FP8 kernels aren't implemented and I guess I lack the time to support them now.

Answer 3 · 2024-02-27T21:42:25.000Z

Thanks for the reply, understood. It would be nice if it could be supported eventually.

Answer 4 · 2024-05-09T14:58:43.000Z

@jkrauss82 I have created one new project which supports FP8 inference with diffusers. However, it has not been open-sourced. I hope it could be made publicly soon...

Answer 5 · 2024-05-09T15:00:05.000Z

Is it planned?
Currently getting this error when trying to run ComfyUI in fp8 (flags --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet):
RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'
I'm quite sure stable fast has its own quantization node but it's not implemented in the node iirc

A new project could be published soon to support FP8 inference instead of stable-fast. I hope everyone will enjoy it.

Answer 6 · 2024-05-09T21:19:24.000Z

That would be very welcome. I have seen fp8 support is getting traction recently in the vllm project. Would be nice to have it in diffusers/img gen as well. I will stay tuned. Thanks for the update!