How does applying a model from URL work?

Question

How does applying a model from URL work?

braheezy opened this issue 2 years ago · 1 comments

Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.

I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the koala.yaml configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/xxxx",
            "sha256": "xxx",
            "filename": "koala.bin"
        }
     ]
   }'

So I went to HuggingFace, searched koala and reviewed one of the top results. It appears to have the model split into multiple files:

pytorch_model-00001-of-000002.bin
pytorch_model-00002-of-000002.bin

Presumably both of these files are needed. I couldn't find examples of how to handle model bin files that are split across multiple files. Additional, some light research indicates I couldn't just cat the model files together.

I found this repository that seems to host a single koala model file. So I tried that:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
            "sha256": "${SHA}",
            "filename": "koala.bin"
        }
     ]
   }'

(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)

After the job finished processing, I was able to see the new model defined:

$ curl -q $LOCALAI/v1/models | jq '.'
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "koala.bin",
      "object": "model"
    },
  ]
}

I proceeded to place prompt-templates/koala.tmpl into the models/ directory. I then tried to call the model and got a 500 error:

$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "koala.bin",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}

I am sure I took a wrong turn at some point. Any advice? Thanks!

Answer 1 · 2023-05-29T07:49:32.000Z

Hey !

The files you picked are for pytorch - you should pick instead ggml files. A tip: I usually search for "ggml" in hugging face instead.

The Author (TheBloke) in the huggingface link you refered to has uploaded quite a bunch of them!

Edit: I had to update your comment and remove any link (as the license of those models is unclear)