How to quantize fine-tune LLM into GGUF format

Question

How to quantize fine-tune LLM into GGUF format

dibyendubiswas1998 opened this issue a month ago · 2 comments

dibyendubiswas1998 commented a month ago

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

Answer 1 · 2024-05-15T12:21:13.000Z

You can use convert.py.

Answer 2 · 2024-05-15T13:57:17.000Z

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert