persimmon-ai-labs/adept-inference

Llama.cpp Support

Closed this issue · 2 comments

Exploring possibilities to support GGML / GGUF formats to run with Llama.cpp

the model is missing some keys and count be converted to GGUF format

'rms_norm_eps'

A full set of Llama.cpp compatible .gguf files is available at
https://huggingface.co/maddes8cht/adept-persimmon-8b-base-gguf
and
https://huggingface.co/maddes8cht/adept-persimmon-8b-chat-gguf
For the moment, cuda accelleration seems not to work, so you need to use -ngl 0 with the cublas versions.