GPTTokenCounter

A plain script to compare and count tokens for OpenAI

OpenAIToken Counter is a simple Python script designed to provide insights into the tokenization process of various languages compared to English.

to run the script you need the following Python libraries to be properly installed:

To install these prerequisites, execute:

pip install tiktoken rich tabulate

Clone the Repository
Execute the Script:
```
python GPTTokencounter.py
```
Language Selection:
- The program kicks off with a brief overview of its purpose.
- You'll then be prompted to either:
  - [a] Default (English and Bangla)
  - [b] Custom
  Choose the default for a comparison between English and Bangla, or opt for a custom language pair.
Provide Required Inputs:
- For the custom option, you'll need to specify the ISO language code, an English word or sentence, and its counterpart in the chosen language.
Examine the Token Comparison:
- The next step showcases a table contrasting the tokenization of the English phrase with the selected language, leveraging the gpt-3.5-turbo model.

Language	English	Bangla
Sentence	I speak Bengali	আমি বাংলায় কথা কই

The tiktoken library is the backbone, facilitating token count based on the model.
The default model in use is gpt-3.5-turbo. Should you wish to experiment with others, adjust the model_name variable within the main() function.
The display_word_parameters() function integrates word wrapping, ensuring legibility for lengthier inputs.

We welcome feedback! If you come across any hiccups do reach out through GitHub.

izardy/GPTTokenCounter