ggerganov/llama.cpp

How to quantize fine-tune LLM into GGUF format

dibyendubiswas1998 opened this issue · 2 comments

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

You can use convert.py.

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert