Gemini 1.5 Pro charges six times more tokens than expected on text prompts.

Question

Gemini 1.5 Pro charges six times more tokens than expected on text prompts.

Opened this issue a month ago · 2 comments

Description of the bug:

At the beginning, I calculate the ratio between characters and tokens, so it matters whether we use the model in English or, as in my case, in Bulgarian (Cyrillic).

val response: GenerateContentResponse = generativeModel.generateContent(content)
val promptTokenCount = response.usageMetadata?.promptTokenCount
val ratio = promptText.length.toDouble() / response.usageMetadata?.promptTokenCount!!
...

Although I have limited the candidates to one, I am calculating all the candidates as shown in the image below.
I calculated the allCandidateCharsCount by taking into account those from the text those from the functionCalls.arg.values.

val responseTextLength = response.text?.length ?: 0
val responseArgsSum = response.functionCalls.sumOf { it.args.values.mapNotNull { it?.length }.sum() }
val expectCandidatesTokenCount = (responseTextLength + responseArgsSum) / ratio

promptCharsCount  = 3729
promptTokenCount = 1925

ratio [chars:tokens]= 1.94:1

allCandidateCharsCount = 1463 (totalCandidates = 1, totalTextChars = 801, totalArgsValuesChars = 662)

expectCandidatesTokenCount = 755.24 (1463 / 1.94)
actualCandidatesTokenCount = 4597
errorCandidates = 608.68%

expectTotalTokenCount = 2680.24
actualTotalTokenCount = 6522
errorTotal = 243.34%

I have used the following model configuration:

Actual vs expected behavior:

Using large language models is quite an expensive process, where costs must be carefully optimized.

Regardless of solutions like Context Caching etc., If the token accounting is not correct, it can be a serious waste of money!

In our case, if you expect to pay $100 at the end of the month, you may, due to a token miscalculation, end up paying $600 for the same thing.

I expect to pay $100 dollars per month, but as a result of a token calculation error, I pay $600.

Any other information you'd like to share?

It would be a good idea to give a credit, as with some other products, to see the real costs. If using the free plan, the actual token consumption is not visible. It is not reported anywhere in the billing.

Here is an example of a similar product, how they solved the problem. Please consider a similar option and this case.

Build with Google AI Forum
https://discuss.ai.google.dev/t/gemini-1-5-pro-charges-x6-more-tokens-than-expected-on-text-prompts

Answer 1 · 2024-06-21T14:46:53.000Z

generative-ai-android v0.8.0 (update to June 17, 2024)

This is analogous for 1000 characters to have 500 tokens at the prompt and to generate 1000 characters to have 4000 tokens, which are also three times more expensive.

I.e. it has to be decided if generating 1000 characters we will not have 500 tokens but more expensive or have 4000 tokens but same price but not both at the same time.

Billing report:

Answer 2 · 2024-06-28T11:55:56.000Z

I conducted an experiment with another generative AI using the same example, and I identified the source of the discrepancy.

!!! When function_call is called, incoming tokens are counted as outgoing !!!

In this specific case, it is easy to count the outgoing tokens and determine that they are significantly higher than expected!
However, if the tokens required to invoke the function are registered as incoming instead of outgoing, accurate calculations will be obtained.

Also, I think the function is called more than once since there is no correct answer in the initial prompt (my example). However, to determine this, it is necessary to provide a logger instead of relying solely on the Android Studio Inspection tool for verification.