su77ungr/CASALIOY

Custom GGML outside LlamaCpp scope

su77ungr opened this issue ยท 6 comments

For the MosaiML: haven't tried yet, feel free to create another issue so that we don't forget after closing this one
Update: mpt-7b-q4_0.bin doesn't work "out of the box", it yields what(): unexpectedly reached end of file and a runtime error.

Originally posted by @hippalectryon-0 in #33 (comment)

Outsourced curated list of supported models; later adding to README.md

Maye create setup.py that fetches directly from HF

Edit: this does counteract the air-gapped idea

from huggingface_hub import hf_hub_download

#Download the model
hf_hub_download(repo_id="LLukas22/gpt4all-lora-quantized-ggjt", filename="ggjt-model.bin", local_dir=".")

Edit: implemented with #61
Also @hippalectryon-0 did you test the 4bit or 16 from Mosaic?

I feel this mpt-7B is faster than the existing one here.

You got it running? We should add benchmark runs so everyone can plot and share results.