the inference speed of GPTQ 4bit quantized model

Question

pineking opened this issue 2 years ago · 2 comments

does someone have compared the inference speed of 4bit quantized model with the origin FP16 model?
is it faster than the origin FP16 model?

Answer 1 · 2023-06-07T06:39:38.000Z

I tested, but int4 costs 2 time of FP16. Anything wrong?

Answer 2 · 2023-11-07T13:55:41.000Z

I tested, but int4 costs 2 time of FP16. Anything wrong?

same，do you know why？