Include usage key in create_completion when streaming
zhudotexe opened this issue · 0 comments
Is your feature request related to a problem? Please describe.
Since create_completion
may yield text chunks comprised of multiple tokens per yield (e.g. in the case of multi-byte Unicode characters), counting the number of yields may not equal the number of tokens actually generated by a model. To accurately get the usage statistics of a streamed completion, one has to run the final text through the tokenizer again, despite create_completion
already tracking the number of tokens generated by the model.
Describe the solution you'd like
When stream=True
in create_completion
, the final chunk yielded should include the usage statistics in the 'usage'
key.
Describe alternatives you've considered
- Saving full generated text and running it through the tokenizer again (seems wasteful)
- Counting the number of yields and hoping we don't have any multi-byte characters (hacky and fragile)
Additional context
The OpenAI API has recently added similar support in their streaming API with the stream_options
key: https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options