This is a JavaScript port of the Treebank Tokenizer from the Python NLTK library.
- The code is written in TypeScript and tests are written in JavaScript using Jest.
- Uses ESBuild for compiling it into a library
const TreebankTokenizer = require("treebank-tokenizer");
t = new TreebankTokenizer();
t.tokenize("This is a sentence.");
Output
['This', 'is', 'a', 'sentence', '.']
const TreebankTokenizer = require("treebank-tokenizer");
t = new TreebankTokenizer();
t.span_tokenize("This is a sentence.");
Output
[ [ 0, 4 ], [ 5, 7 ], [ 8, 9 ], [ 10, 18 ], [ 18, 19 ] ]
Clone and install dependencies using NPM
npm install
Running tests and coverage
npm run test
npm run test-coverage
Kindly open issues, fork and provide pull requests where improvements are possible.