/tamil-word-tokenizer

A word tokenizer NLP tool for the Tamil language

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

tamil-word-tokenizer

A word tokenizer NLP tool for the Tamil language

Word Tokenizer

Command-line utility to perform word tokenization on a given Tamil corpus text file.

Usage

python word_tokenizer.py <input_file>

For help

python word_tokenizer.py -h

Features

  1. No preprocessing needed
  2. Works on any OS which supports Python 3
  3. Handles input file of any size