/tokenizer

Command line tool to display NLP token count

Primary LanguagePythonMIT LicenseMIT

Tokenizer

Simple tool that uses Huggingface's tokenizer tool to calculate how many NLP tokens are in a file. Optionally accepts another filepath to write the tokenized file to.