Using the/v1/chat/completion API with system messages, llama box crashes

Question

Using the/v1/chat/completion API with system messages, llama box crashes

Closed this issue 5 months ago · 1 comments

method: POST
url: http://127.0.0.1:8080/v1/chat/completions
headers: [Accept: text/event-stream], [Authorization: Bearer e4nnF...XU]
body: {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are Square's personal AI assistant."
},
{
"role": "user",
"content": "who are you?"
}
],
"temperature": 0.7,
"stream": true
}

I used Docker to run Oneapi 2024.2 version, Dockerfile can be found in the attachment.

docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 -p 8080:8080 -e ZES_ENABLE_SYSMAN=1 --ulimit memlock=-1 llama-box-intel -m "/app/models/Qwen2-7B-Instruct/ggml-model-Q4_K_M.gguf" -fa -t 4 -np 2 --mlock --host 0.0.0.0

llama-box-intel.Dockerfile.txt

Answer 1 · 2024-10-18T00:48:56.000Z

This issue was closed because it has been inactive for 14 days since being marked as stale.