/tokenizer

Tokenizer for Hindi,English,Telugu

Primary LanguagePython

Tokenizer

Tokenizer for Hindi,English,Telugu

##Working

This is a python-2.7 code ###Note: x_plot.jpeg has the plot for overall data. x_plot_first_1000.jpeg has the plot for first 1000 rank words.

###Running the code:

./tokenizer.py

####Options:

  1. - plot --- only plots the graph - plotAndWrite --- plots and writes the output in the output File - write --- only writes to the file
  2. [can be given only if option-1 is given]

    • unigram --- gives unigram output
    • bigrams --- gives bigrams output
    • trigrams --- gives trigrams output