how to dequant a EETQ model?
mxjmtxrm opened this issue · 4 comments
Hi, if there is any function to dequant a int8 weight to fp16? or is there a way to dequant a EetqLinear back to a linear?
No, I want a model with a precision of fp16 to run on my own framework to test the accuracy of EETQ quantization. So I need to dequant the int8 weights to fp16.
I noticed that there are unprocessed_quantized_weight, processed_quantized_weight and scale in (
What is the difference between these two weight?
if the dequant weight==EetqLinear.weight * EetqLinear.weight_scales ?
EETQ will change the layout via preprocess_weights_for_mixed_gemm to accelerate the memory access. The two kinds of weight are different. unprocessed_quantized_weight==EetqLinear.weight * EetqLinear.weight_scales. You can refer to the backward of https://github.com/NetEase-FuXi/EETQ/pull/15/files#diff-ca179e954c684327ef4fba983db3ed3965f5406ef9f922b12e55f897829823ecR83 for how to dequantize the weight
Got it. Thanks a lot.