Using the/v1/chat/completion API with system messages, llama box crashes
Closed this issue · 1 comments
squallliu commented
method: POST
url: http://127.0.0.1:8080/v1/chat/completions
headers: [Accept: text/event-stream], [Authorization: Bearer e4nnF...XU]
body: {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are Square's personal AI assistant."
},
{
"role": "user",
"content": "who are you?"
}
],
"temperature": 0.7,
"stream": true
}
I used Docker to run Oneapi 2024.2 version, Dockerfile can be found in the attachment.
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 -p 8080:8080 -e ZES_ENABLE_SYSMAN=1 --ulimit memlock=-1 llama-box-intel -m "/app/models/Qwen2-7B-Instruct/ggml-model-Q4_K_M.gguf" -fa -t 4 -np 2 --mlock --host 0.0.0.0
github-actions commented
This issue was closed because it has been inactive for 14 days since being marked as stale.