qwopqwop200/GPTQ-for-LLaMa

the inference speed of GPTQ 4bit quantized model

pineking opened this issue · 2 comments

does someone have compared the inference speed of 4bit quantized model with the origin FP16 model?
is it faster than the origin FP16 model?

I tested, but int4 costs 2 time of FP16. Anything wrong?

I tested, but int4 costs 2 time of FP16. Anything wrong?

same,do you know why?