feat: [support return multiple choices]
Closed this issue · 3 comments
nguyenhoangthuan99 commented
Problem
- Support params:
nintegeror null - Optional
- Defaults to 1
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
-> need to check if llama.cpp support this option.
reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-n
related issue: https://github.com/janhq/internal/issues/160
nguyenhoangthuan99 commented
according to this comment, llamacpp hasn't supported it yet.
this issue need to be transferred to handle at the cortex.cpp layer
nguyenhoangthuan99 commented
Now we can get multiple choices from 1 request by adding n params to input
curl http://localhost:3928/v1/chat/completions -H "Content-Type: application/json" -d '{
"engine":"cortex.llamacpp",
"model": "meta-llama3.1-8b-instruct",
"n_probs":1,
"stream":false,
"top_k":20,
"n":3,
"messages": [
{
"role": "user",
"content": "Who won the world series in 2020?"
},
]
}'
Response:
{
"choices" :
[
{
"finish_reason" : null,
"index" : 0,
"message" :
{
"content" : "The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays in the series, winning four games to two. The final game was played on October 27, 2020.<|eot_id|>",
"role" : "assistant"
}
},
{
"finish_reason" : null,
"index" : 1,
"message" :
{
"content" : "The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays in 6 games, winning the final game on October 27, 2020. This was their first championship since 1988.<|eot_id|>",
"role" : "assistant"
}
},
{
"finish_reason" : null,
"index" : 2,
"message" :
{
"content" : "The Los Angeles Dodgers won the World Series in 2020.<|eot_id|>",
"role" : "assistant"
}
}
],
"created" : 1730345128,
"id" : "kPlhopLJhYAQ0hQtCRVD",
"model" : "_",
"object" : "chat.completion",
"system_fingerprint" : "_",
"usage" :
{
"completion_tokens" : 43,
"prompt_tokens" : 21,
"total_tokens" : 64
}
}
gabrielle-ong commented
✅ QA API - thank you @nguyenhoangthuan99!

