How to quantize LLM to INT4?

Question

How to quantize LLM to INT4?

YixinSong-e opened this issue 9 months ago · 5 comments

I want to quantize my llama-finetuned model to INT4 and deploy it on my 8 gen3 device. But I don't know how to do it. So when will we have a tutorial?

Answer 1 · 2024-03-25T16:33:45.000Z

@bhushan23 mentioned this in slack. We are actively working on providing sample recipes and looking at int4. When we have a tutorial, we'll post it in Slack.

Answer 2 · 2024-04-01T02:49:01.000Z

@mestrona-3 I hope to get it soon. when do you plan to release the tutorial ?

Answer 3 · 2024-04-05T17:54:19.000Z

Hi @Junhyuk it is on our roadmap for the next 4-6 weeks, I'll circle back here, and on slack when it is ready!

Answer 4 · 2024-04-06T07:13:57.000Z

Hi @mestrona-3
Thanks for your update.
I will focus continuously for this upate.

Answer 5 · 2024-05-29T15:51:31.000Z

Hi @Junhyuk @YixinSong-e
Llama2 export scripts are out now https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized

Please give it a try and let us know how it goes