Tokens are slightly different from OpenAI Tokenizer
Closed this issue · 1 comments
nilsreichardt commented
Using the sentense "Welcome to gpt-tokenizer. Replace this with your text to see how tokenization works." I'm getting 20 tokens from OpenAI Tokenizer. Using https://gpt-tokenizer.dev/ I'm getting there 19.
gpt-tokenizer.dev | OpenAI Tokenizer |
---|---|
Another example:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Facilisis gravida neque convallis a cras semper auctor neque vitae. Nunc mattis enim ut tellus elementum sagittis vitae et leo. Tellus rutrum tellus pellentesque eu tincidunt tortor aliquam nulla facilisi. Volutpat lacus laoreet non curabitur gravida arcu ac. Diam phasellus vestibulum lorem sed risus ultricies tristique nulla aliquet.
Using gpt-tokenizer.dev
: 119 tokens
Using OpenAI Tokenizer
: 158 tokens
nilsreichardt commented
Ah, found this reply: latitudegames/GPT-3-Encoder#40 (comment)