OpenGVLab/EfficientQAT

Data arrangement

yancaoweidaode opened this issue · 4 comments

I see the model from https://hf-mirror.com/ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w4g128/tree/main. But, I have a question, How can I de-quantize this safetensors model from int4 128block to f16 or f32, and store the data sequentially after de-quantization?

You can refer #4 (comment) to achieve this goal.

I know the dequantize formula is W = (W_int-Z) * S, but i don't konw the data arrangement in the model file.

You can refer #4 (comment) to achieve this goal.

what is the mean of fake quantization. it is fp16 or fp32?

Fake quantization indicates fp16 in our code. Anyway, you can also change it to fp32.

Additionally, what is the meaning of data arrangement? The fake quantization model can be load seamlessly by transformers.