Prompt token consumption grows until /reset
k3it opened this issue · 8 comments
Hi.
Thank you for the updates and support of the turbo model!
I noticed that each subsequent query within a conversation uses up more prompt tokens. this continues until I reset the session. Does this sound like the correct behavior?
here is an example of the same prompt, with each iteration increasing token usage:
Hi @k3it, good catch. Yeah, I think it's showing the sum of the prompt tokens used within a conversation.
I would say that's the expected behavior, since I'm not doing any calculations myself, just printing the usage
object that is returned by the API. Anyway I will keep this issue open in case anyone has more knowledge on this!
I did some more checking and it looks like "Token used" is an individual counter for each prompt/response transaction. it is not a cumulative for the current conversation. So each new question within the same conversation becomes more expensive to ask.
After a long session without a /reset it may be cheaper to buy and weight the banana yourself instead of asking the bot about it :)
edit: i see now that message query and answer history is added to each completion. that explains the growth I think
chatgpt-telegram-bot/openai_helper.py
Lines 32 to 37 in 71209d6
chatgpt-telegram-bot/openai_helper.py
Line 56 in 71209d6
edit2: here is what the bot thinks about this (might not be accurate?)
The GPT completion API can remember the context of the conversation by itself using its internal memory, without needing to send the full message history back each time a new message is sent.
When you create a new completion request using the GPT API, you can include some context that the endpoint can use to better understand the request. This context can come from the last few messages in the conversation, as well as any additional information you provide. The GPT model can then use this context to generate a more accurate and relevant response.
Indeed it looks like sending history also cosumes tokens.
Here's what the docs say:
Including the conversation history helps when user instructions refer to prior messages. [...] Because the models have no memory of past requests, all relevant information must be supplied via the conversation. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.
So I agree that history needs to be truncated or summarized somehow. I commented on your PR for possible solutions
Hello and thanks a lot for the updates and fixes
I've seen this application of embeddings in another bot (link below) to solve the token issue. Would it be a good idea to implement to further reduce on the token consumption?
Hi @em108 I'm not familiar with embeddings or how they can be implemented in python. What are the advantages?
From what I've gathered from multiple sources including the article below, embeddings can aid in long term memory / a way to store conversation data. Based on the cost and application, they can cost up to 5 - ~9 times less than sending chat history.
Article:
https://towardsdatascience.com/generative-question-answering-with-long-term-memory-c280e237b144
Also an example of a notebook utilizing embeddings:
https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb