Tokenator is a C/C++ library for counting tokens for GPT-3 and GPT-4.
Using the provided library functions, you can calculate how many tokens your request to the OpenAI API takes up.
The library provides a C function:
size_t tokenator_count(const char* data, size_t len);Or C++ functions:
namespace tokenator {
size_t count(const std::string& text) noexcept;
size_t count(const char* data, size_t len) noexcept;
}Source codes are located in the src directory.
To compile the library, install the dependencies
sudo apt-get install libicu-devAnd run the command
cd src
makeCompilation takes a long time. Therefore, for your convenience, I have added compressed precompiled files in the libs directory:
libtokenator.a, libtokenator_cpp.a
Header files are located in the include directory.
Here is an example of using the C++ library
#include <iostream>
#include <string>
#include <cassert>
#include "tokenator.hpp"
int main() {
std::string str{"This is a test, and it's working!"};
size_t count = tokenator::count(str);
std::cout << "phrase (" << count << " ):\t" << str << std::endl;
assert(count == 10);
return 0;
}And here is an example of using the C library
#include "tokenator.h"
#include <stdio.h>
#include <string.h>
#include <assert.h>
int main() {
const char* data = "This is a test, and it's working!";
size_t count = tokenator_count(data, strlen(data));
printf("phrase (%zu):\t%s\n", count, data);
assert(count == 10);
return 0;
}In the test directory, you can find examples of using the library for different languages: