janhq/cortex.llamacpp

feat: [support return multiple choices]

Closed this issue · 3 comments

Problem

  • Support params: n integer or null
  • Optional
  • Defaults to 1
    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

-> need to check if llama.cpp support this option.

reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-n

related issue: https://github.com/janhq/internal/issues/160

according to this comment, llamacpp hasn't supported it yet.

image

this issue need to be transferred to handle at the cortex.cpp layer

Now we can get multiple choices from 1 request by adding n params to input

curl http://localhost:3928/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "engine":"cortex.llamacpp",                                                                      
    "model": "meta-llama3.1-8b-instruct",
    "n_probs":1,                                                                      
    "stream":false,                                                        
    "top_k":20,                                                               
    "n":3,
    "messages": [
      {            
        "role": "user",
        "content": "Who won the world series in 2020?"
      },          
    ]
  }'

Response:

{
        "choices" : 
        [
                {
                        "finish_reason" : null,
                        "index" : 0,
                        "message" : 
                        {
                                "content" : "The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays in the series, winning four games to two. The final game was played on October 27, 2020.<|eot_id|>",
                                "role" : "assistant"
                        }
                },
                {
                        "finish_reason" : null,
                        "index" : 1,
                        "message" : 
                        {
                                "content" : "The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays in 6 games, winning the final game on October 27, 2020. This was their first championship since 1988.<|eot_id|>",
                                "role" : "assistant"
                        }
                },
                {
                        "finish_reason" : null,
                        "index" : 2,
                        "message" : 
                        {
                                "content" : "The Los Angeles Dodgers won the World Series in 2020.<|eot_id|>",
                                "role" : "assistant"
                        }
                }
        ],
        "created" : 1730345128,
        "id" : "kPlhopLJhYAQ0hQtCRVD",
        "model" : "_",
        "object" : "chat.completion",
        "system_fingerprint" : "_",
        "usage" : 
        {
                "completion_tokens" : 43,
                "prompt_tokens" : 21,
                "total_tokens" : 64
        }
}

✅ QA API - thank you @nguyenhoangthuan99!

  • Requires upgrade cortex.llama-cpp engine to 0.1.37-01.11.24
  • cortex-nightly engines install llama-cpp -v v0.1.37-01.11.24`
  • n = 3, expect 3 choices returned
    Image