how to dequant a EETQ model?

Hi, if there is any function to dequant a int8 weight to fp16? or is there a way to dequant a EetqLinear back to a linear?

Hi @mxjmtxrm. I think you probably want a backprob for EetqLinear. We have implemented an backprob function and it is under testing. #15 . Could you try to use it and give me a feedback?

No, I want a model with a precision of fp16 to run on my own framework to test the accuracy of EETQ quantization. So I need to dequant the int8 weights to fp16.
I noticed that there are unprocessed_quantized_weight, processed_quantized_weight and scale in (

EETQ/csrc/cutlass_kernels/fpA_intB_gemm_wrapper.cu

Line 106 in 5c08b06

return std::vector<torch::Tensor>{processed_quantized_weight, scales};

)
What is the difference between these two weight?
if the dequant weight==EetqLinear.weight * EetqLinear.weight_scales ?

EETQ will change the layout via preprocess_weights_for_mixed_gemm to accelerate the memory access. The two kinds of weight are different. unprocessed_quantized_weight==EetqLinear.weight * EetqLinear.weight_scales. You can refer to the backward of https://github.com/NetEase-FuXi/EETQ/pull/15/files#diff-ca179e954c684327ef4fba983db3ed3965f5406ef9f922b12e55f897829823ecR83 for how to dequantize the weight

Got it. Thanks a lot.