quic/ai-hub-models

How to quantize LLM to INT4?

YixinSong-e opened this issue · 5 comments

I want to quantize my llama-finetuned model to INT4 and deploy it on my 8 gen3 device. But I don't know how to do it. So when will we have a tutorial?

@bhushan23 mentioned this in slack. We are actively working on providing sample recipes and looking at int4. When we have a tutorial, we'll post it in Slack.

@mestrona-3 I hope to get it soon. when do you plan to release the tutorial ?

Hi @Junhyuk it is on our roadmap for the next 4-6 weeks, I'll circle back here, and on slack when it is ready!

Hi @mestrona-3
Thanks for your update.
I will focus continuously for this upate.

Hi @Junhyuk @YixinSong-e
Llama2 export scripts are out now https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized

Please give it a try and let us know how it goes