When using OpenAI GPT, you may need to know how many tokens your code is using for various purposes, such as estimating costs and improving results.
The GPT3Tokenizer
C# class can help you count tokens in your prompts and
in the responses received.
using AI.Dev.OpenAI.GPT;
string text = "January 1st, 2000";
// 5 tokens => [21339, 352, 301, 11, 4751]
List<int> tokens = GPT3Tokenizer.Encode(text);
The tokenizer uses a byte-pair encoding (BPE) algorithm to split words into subwords based on frequency and merges rules. It can handle out-of-vocabulary words, punctuation, and special tokens.
The result of this library is compatible with OpenAI GPT tokenizer that you can also test here.
Install AI.Dev.OpenAI.GPT NuGet package from nuget.org, e.g.:
dotnet add package AI.Dev.OpenAI.GPT --version 1.0.2
or
NuGet\Install-Package AI.Dev.OpenAI.GPT -Version 1.0.2
If you are looking for an equivalent solution in other languages:
This library is licensed CC0, in the public domain. You can use it for any application, you can modify the code, and you can redistribute any part of it.
I am not affiliated with OpenAI and this library is not endorsed by them. I just work with several AI solutions and I share this code hoping to make technology more accessible and easier to work with.