tokenizing '20th' to '2','0','th'
KavishBhatia opened this issue · 1 comments
KavishBhatia commented
How to make this as one token and not separate it. Where is this tokenizing happening?
AzharSultan commented
it happen in the default pipeline of tokenizer here. You can pass a custom pipeline to the tokenizer and removing "EMOJI" from that pipeline removes this problem.