[Bug]Max output tokens from an LLM should be configurable.

Question

[Bug]Max output tokens from an LLM should be configurable.

mkbhanda opened this issue a month ago · 4 comments

mkbhanda commented a month ago

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

Pull docker images from hub.docker.com
Build docker images from source

Deploy method

Docker compose
Docker
Kubernetes
Helm

Running nodes

Single Node

What's the version?

Development branch, post V1.0.

Description

The ChatQnA example appears to be using the max_tokens parameter to control the number of output llm tokens, but it is not getting passed along if the re-ranker component is removed from the pipeline.
Perhaps we have a bug in Mega or in the token used. The OpenAI openAPI uses max_completion_tokens and we perhaps have migrated to this incompletely. We may need to also check GenAIComps.

This was noticed by @leslieluyu.

Reproduce steps

Run ChatQnA without re-ranker and try to control the maximum output tokens by passing in some value.

Raw log

No response

Answer 1 · 2024-11-27T07:30:36.000Z

@yao531441,
Please help to check this issue. The maximum output tokens setting should not be related with reranking.

Answer 2 · 2024-11-28T06:59:41.000Z

@mkbhanda @leslieluyu Can you provide more detailed steps to reproduce? Our test using Docker to start chatqna without reranking are normal.

Answer 3 · 2024-12-06T07:16:22.000Z

Hi @mkbhanda, this issue was fixed in v1.1. Could we close this issue?

Answer 4 · 2024-12-06T11:24:49.000Z

Thank you!!!