kvcache-ai/ktransformers

Specify MAX_NEW_TOKENS for ktransformers server

arthurv opened this issue · 2 comments

max_new_tokens = 1000 by default, and this can be specified in ktransformers.local_chat through --max_new_tokens, but not the server.

Please add the --max_new_tokens option to the ktransformers server so we can specify longer output context lengths, and add more generation options (like input context, etc).

Apologies for the inconvenience. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.

I just encountered this limitation. It would be even better if the REST API honored the maximum context length and maximum number of generation tokens.