token count is inconsistent with OpenAI tokenizer
GorvGoyl opened this issue · 1 comments
GorvGoyl commented
As shown below:
text:
<|im_start|>dd<|im_sep|>OpenAI's large language models (sometimes referred to as GPT's) process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.<|im_end|><|im_start|>assistant<|im_sep|><|im_end|><|im_start|>assistant<|im_sep|>
syntaxtrash commented
any update to this? They work fine without the special characters.