Tokenizer for Hindi,English,Telugu
##Working
This is a python-2.7 code ###Note: x_plot.jpeg has the plot for overall data. x_plot_first_1000.jpeg has the plot for first 1000 rank words.
###Running the code:
./tokenizer.py
####Options:
- - plot --- only plots the graph - plotAndWrite --- plots and writes the output in the output File - write --- only writes to the file
-
[can be given only if option-1 is given]
- unigram --- gives unigram output
- bigrams --- gives bigrams output
- trigrams --- gives trigrams output