dqbd/tiktoken

It doesn't support new model "o1-mini" and "o1-preview"

Closed this issue · 8 comments

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

Hi,
I’m using the tiktoken library to count tokens for the gpt-4o-mini model. However, I’ve noticed a discrepancy between my token counts and the counts returned by the OpenAI API. It seems that tiktoken doesn’t fully support this new model yet, and the tokenization may differ slightly. Is there a plan to officially support the gpt-4o-mini in tiktoken?

Thanks in advance!

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

Here’s my example code:

const countTokens = (messages: any[], model: TiktokenModel): number => {
const enc = encoding_for_model(model); // Tokenizer for the model
let tokenCount = 0;

// Iterating over each message and counting tokens for 'role' and 'content'
messages.forEach((message) => {
    tokenCount += enc.encode(message.role).length;   // Count role tokens
    tokenCount += enc.encode(message.content).length; // Count content tokens
});

return tokenCount;

};

const messages = [
{
role: 'system',
content: instructions
},
{
role: 'user',
content: userContent
}
];

const model: TiktokenModel = "gpt-4o-mini";
const tokenCountInput = countTokens(messages, model);

dqbd commented

Hello! Will keep monitoring openai#337 to see if there are any changes w.r.t. the underlying token map.

dqbd commented

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to separate the messages: see dqbd/tiktokenizer

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to divide the messages: see dqbd/tiktokenizer

Thank you for your answer!
I do this because I get a smaller number of tokens than openai returns in the api response

I got 1708 incoming tokens in the big text and 1717 in the response from openai. It's a small difference, but I don't understand what it's about, so I added two roles

UPD: Thank you for the link to the feature. It works better now, but there are discrepancies with the answer from openai

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

Hi. Unfortunately, I don't know that. Share the answer if you find the information

dqbd commented

Got clarification with the latest tiktoken@0.8.0 release, updating here as well