How does applying a model from URL work?
braheezy opened this issue · 1 comments
Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.
I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the koala.yaml
configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/koala.yaml",
"name": "koala",
"overrides": { "parameters": {"model": "koala.bin" } },
"files": [
{
"uri": "https://huggingface.co/xxxx",
"sha256": "xxx",
"filename": "koala.bin"
}
]
}'
So I went to HuggingFace, searched koala
and reviewed one of the top results. It appears to have the model split into multiple files:
pytorch_model-00001-of-000002.bin
pytorch_model-00002-of-000002.bin
Presumably both of these files are needed. I couldn't find examples of how to handle model bin
files that are split across multiple files. Additional, some light research indicates I couldn't just cat
the model files together.
I found this repository that seems to host a single koala
model file. So I tried that:
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/koala.yaml",
"name": "koala",
"overrides": { "parameters": {"model": "koala.bin" } },
"files": [
{
"uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
"sha256": "${SHA}",
"filename": "koala.bin"
}
]
}'
(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)
After the job finished processing, I was able to see the new model defined:
$ curl -q $LOCALAI/v1/models | jq '.'
{
"object": "list",
"data": [
{
"id": "ggml-gpt4all-j",
"object": "model"
},
{
"id": "koala.bin",
"object": "model"
},
]
}
I proceeded to place prompt-templates/koala.tmpl
into the models/
directory. I then tried to call the model and got a 500 error:
$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "koala.bin",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}
I am sure I took a wrong turn at some point. Any advice? Thanks!
Hey !
The files you picked are for pytorch - you should pick instead ggml
files. A tip: I usually search for "ggml" in hugging face instead.
The Author (TheBloke) in the huggingface link you refered to has uploaded quite a bunch of them!
Edit: I had to update your comment and remove any link (as the license of those models is unclear)