NetEase-FuXi/EETQ

how to dequant a EETQ model?

mxjmtxrm opened this issue · 4 comments

Hi, if there is any function to dequant a int8 weight to fp16? or is there a way to dequant a EetqLinear back to a linear?

Hi @mxjmtxrm. I think you probably want a backprob for EetqLinear. We have implemented an backprob function and it is under testing. #15 . Could you try to use it and give me a feedback?

No, I want a model with a precision of fp16 to run on my own framework to test the accuracy of EETQ quantization. So I need to dequant the int8 weights to fp16.
I noticed that there are unprocessed_quantized_weight, processed_quantized_weight and scale in (

return std::vector<torch::Tensor>{processed_quantized_weight, scales};
)
What is the difference between these two weight?
if the dequant weight==EetqLinear.weight * EetqLinear.weight_scales ?

EETQ will change the layout via preprocess_weights_for_mixed_gemm to accelerate the memory access. The two kinds of weight are different. unprocessed_quantized_weight==EetqLinear.weight * EetqLinear.weight_scales. You can refer to the backward of https://github.com/NetEase-FuXi/EETQ/pull/15/files#diff-ca179e954c684327ef4fba983db3ed3965f5406ef9f922b12e55f897829823ecR83 for how to dequantize the weight

Got it. Thanks a lot.