Custom GGML outside LlamaCpp scope
su77ungr opened this issue ยท 6 comments
For the MosaiML: haven't tried yet, feel free to create another issue so that we don't forget after closing this one
Update: mpt-7b-q4_0.bin doesn't work "out of the box", it yields what(): unexpectedly reached end of file and a runtime error.
Originally posted by @hippalectryon-0 in #33 (comment)
Outsourced curated list of supported models; later adding to README.md
Maye create setup.py that fetches directly from HF
Edit: this does counteract the air-gapped idea
from huggingface_hub import hf_hub_download
#Download the model
hf_hub_download(repo_id="LLukas22/gpt4all-lora-quantized-ggjt", filename="ggjt-model.bin", local_dir=".")
Edit: implemented with #61
Also @hippalectryon-0 did you test the 4bit or 16 from Mosaic?
Only mpt-7b-q4_0.bin from https://huggingface.co/LLukas22/mpt-7b-ggml
I feel this mpt-7B is faster than the existing one here.
You got it running? We should add benchmark runs so everyone can plot and share results.