custom-tokenizer

custom tokenizer for low resource language like hindi