Data arrangement

Question

yancaoweidaode opened this issue 3 months ago · 4 comments

I see the model from https://hf-mirror.com/ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w4g128/tree/main. But, I have a question, How can I de-quantize this safetensors model from int4 128block to f16 or f32, and store the data sequentially after de-quantization?

Answer 1 · 2024-08-02T03:48:08.000Z

You can refer #4 (comment) to achieve this goal.

Answer 2 · 2024-08-02T03:49:14.000Z

I know the dequantize formula is W = (W_int-Z) * S, but i don't konw the data arrangement in the model file.

Answer 3 · 2024-08-02T03:50:45.000Z

You can refer #4 (comment) to achieve this goal.

what is the mean of fake quantization. it is fp16 or fp32?

Answer 4 · 2024-08-02T06:38:54.000Z

Fake quantization indicates fp16 in our code. Anyway, you can also change it to fp32.

Additionally, what is the meaning of data arrangement? The fake quantization model can be load seamlessly by transformers.