Include usage key in create_completion when streaming

Question

Include usage key in create_completion when streaming

zhudotexe opened this issue 7 months ago · 0 comments

Is your feature request related to a problem? Please describe.
Since create_completion may yield text chunks comprised of multiple tokens per yield (e.g. in the case of multi-byte Unicode characters), counting the number of yields may not equal the number of tokens actually generated by a model. To accurately get the usage statistics of a streamed completion, one has to run the final text through the tokenizer again, despite create_completion already tracking the number of tokens generated by the model.

Describe the solution you'd like
When stream=True in create_completion, the final chunk yielded should include the usage statistics in the 'usage' key.

Describe alternatives you've considered

Saving full generated text and running it through the tokenizer again (seems wasteful)
Counting the number of yields and hoping we don't have any multi-byte characters (hacky and fragile)

Additional context
The OpenAI API has recently added similar support in their streaming API with the stream_options key: https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options