Specify MAX_NEW_TOKENS for ktransformers server

Question

Specify MAX_NEW_TOKENS for ktransformers server

arthurv opened this issue 3 months ago · 2 comments

max_new_tokens = 1000 by default, and this can be specified in ktransformers.local_chat through --max_new_tokens, but not the server.

Please add the --max_new_tokens option to the ktransformers server so we can specify longer output context lengths, and add more generation options (like input context, etc).

Answer 1 · 2024-09-23T03:15:47.000Z

Apologies for the inconvenience. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.

Answer 2 · 2024-10-14T14:41:49.000Z

I just encountered this limitation. It would be even better if the REST API honored the maximum context length and maximum number of generation tokens.