How to quantize fine-tune LLM into GGUF format
dibyendubiswas1998 opened this issue · 2 comments
dibyendubiswas1998 commented
Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.
yentur commented
You can use convert.py.
ngxson commented
You can firstly merge the qlora into the model (that will produce a new set of .safetensors
files)
Then either use convert.py
or convert-hf-to-gguf.py
to convert the safetensors model into gguf
P/s: convert-lora-to-ggml.py
is removed a while ago, so the only way to run qlora currently is to merge & convert