Voca doesn't handle complex characters like కృష్ణ
kotpal opened this issue · 4 comments
kotpal commented
Welcome to Voca's GitHub repo!
Expected behavior 😸
v.graphemes("కృష్ణ") should return (2) ["కృ", "ష్ణ"]
v.countGraphemes("కృష్ణ") should return 2
Actual behavior 😿
v.graphemes("కృష్ణ") returns (5) ["క", "ృ", "ష", "్", "ణ"]
v.countGraphemes("కృష్ణ") returns 5
Steps to reproduce 👷
v.graphemes("కృష్ణ")
v.countGraphemes("కృష్ణ")
Technical details: 🔧
Browser/OS type: n/a
Node version: n/a
illarionvk commented
@kotpal, grapheme-splitter library splits your example into three graphemes:
[ 'కృ', 'ష్', 'ణ' ]
Would you consider it correct?
kotpal commented
No. v.graphemes("కృష్ణ") should return (2) ["కృ", "ష్ణ"]
I started using nota/split-graphemes. This issue is fixed in that library - nota/split-graphemes#5
panzerdp commented
Voca supports a limited number of graphemes to keep the library size under minimum. Use an alternative library that has full support.