niieani/gpt-tokenizer

How can I get prompts_tokens?

magelikescoke opened this issue · 3 comments

what parameter should i pass to get the same prompts_tokens as returned by openai.

{
  "messages": [
        {
            "role": "system",
            "content": ""
        },
        {
            "role": "user",
            "content": "你好"
        },
        {
            "role": "assistant",
            "content": "你好!有什么我可以帮助你的吗?"
        },
        {
            "role": "user",
            "content": "介绍下你自己"
        }
    ]
}

This is my prompt, how to stringfy it and pass to encode?

Same question as above.

I've tried running encode on each of the "content" values, and summing those, as well as running encode on the JSON.stringify(entireMessagesArray).

The first method (summation of messages alone) gave me about 200 tokens less than what the actual OpenAI returned for "prompt_tokens", and the second overshot by about 200. For reference, this was on a request with 2994 prompt_tokens.

I'm using the "gpt-3.5-turbo" model, and importing the default encode from "gpt-tokenizer" (which according to the docus should align with gpt-3.5-turbo).

Should be fixed in new version - there's a new API called encodeChat. See the updated README for details.
Let me know if you still have issues.

🎉 This issue has been resolved in version 2.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀