okcom_tokenizer An indenpendent revised tokenizer from marginalbear with emoji parsing and unigram tokenization added.