[Bug]Max output tokens from an LLM should be configurable.
mkbhanda opened this issue · 4 comments
Priority
P2-High
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
- Pull docker images from hub.docker.com
- Build docker images from source
Deploy method
- Docker compose
- Docker
- Kubernetes
- Helm
Running nodes
Single Node
What's the version?
Development branch, post V1.0.
Description
The ChatQnA example appears to be using the max_tokens parameter to control the number of output llm tokens, but it is not getting passed along if the re-ranker component is removed from the pipeline.
Perhaps we have a bug in Mega or in the token used. The OpenAI openAPI uses max_completion_tokens and we perhaps have migrated to this incompletely. We may need to also check GenAIComps.
This was noticed by @leslieluyu.
Reproduce steps
Run ChatQnA without re-ranker and try to control the maximum output tokens by passing in some value.
Raw log
No response
@yao531441,
Please help to check this issue. The maximum output tokens setting should not be related with reranking.
@mkbhanda @leslieluyu Can you provide more detailed steps to reproduce? Our test using Docker to start chatqna without reranking are normal.
Thank you!!!