This script estimates the number of tokens in a given text chunk and calculates the potential cost based on OpenAI's GPT-4 pricing model.
The script uses the HuggingFace Transformers library to tokenize the text using the GPT-2 tokenizer, which should be a close approximation to the GPT-3/4 tokenizer. The script then calculates the estimated cost for the provided text for both prompt and sampled tokens and for both 8k and 32k context length models.
- Python 3
- HuggingFace Transformers library (install via
pip install transformers
)
- Clone or download the repository.
- Ensure you have the required Python packages installed.
- Replace the content of
input.txt
with the text you want to estimate the token count and cost for. - Run the script using
python3 tokens.py
. - The script will display the estimated token count and the associated cost based on the GPT-4 pricing model.
For 8k context length models:
- $0.03/1k prompt tokens
- $0.06/1k sampled tokens
For 32k context length models:
- $0.06/1k prompt tokens
- $0.12/1k sampled tokens
(Note: Always refer to OpenAI's official documentation for up-to-date pricing.)
This script is provided under the MIT License. Feel free to use, modify, and distribute as needed.