/gpt-tokenator

GPT 3 tokens counter

Primary LanguageC++MIT LicenseMIT

GPT-Tokenator

Tokenator is a C/C++ library for counting tokens for GPT-3 and GPT-4.

Using the provided library functions, you can calculate how many tokens your request to the OpenAI API takes up.

The library provides a C function:

size_t tokenator_count(const char* data, size_t len);

Or C++ functions:

namespace tokenator {
    size_t count(const std::string& text) noexcept;
    size_t count(const char* data, size_t len) noexcept;
}

Source codes are located in the src directory.

To compile the library, install the dependencies

sudo apt-get install libicu-dev

And run the command

cd src
make

Compilation takes a long time. Therefore, for your convenience, I have added compressed precompiled files in the libs directory:

libtokenator.a, libtokenator_cpp.a

Header files are located in the include directory.

Here is an example of using the C++ library

#include <iostream>
#include <string>
#include <cassert>
#include "tokenator.hpp"

int main() {
    std::string str{"This is a test, and it's working!"};
    size_t count = tokenator::count(str);
    std::cout << "phrase (" << count << " ):\t" << str << std::endl;
    assert(count == 10);

    return 0;
}

And here is an example of using the C library

#include "tokenator.h"
#include <stdio.h>
#include <string.h>
#include <assert.h>

int main() {
    const char* data = "This is a test, and it's working!";
    size_t count = tokenator_count(data, strlen(data));
    printf("phrase (%zu):\t%s\n", count, data);
    assert(count == 10);

    return 0;
}

In the test directory, you can find examples of using the library for different languages:

Source code

https://github.com/valmat/gpt-tokenator

License

The MIT License