panzerdp/voca

Voca doesn't handle complex characters like కృష్ణ

kotpal opened this issue · 4 comments

Welcome to Voca's GitHub repo!

Expected behavior 😸

v.graphemes("కృష్ణ") should return (2) ["కృ", "ష్ణ"]
v.countGraphemes("కృష్ణ") should return 2

Actual behavior 😿

v.graphemes("కృష్ణ") returns (5) ["క", "ృ", "ష", "్", "ణ"]
v.countGraphemes("కృష్ణ") returns 5

Steps to reproduce 👷

v.graphemes("కృష్ణ")
v.countGraphemes("కృష్ణ")

Technical details: 🔧

Browser/OS type: n/a
Node version: n/a

@kotpal Thanks for the report.
Feel free to create a PR with the fix.

@kotpal, grapheme-splitter library splits your example into three graphemes:

[ 'కృ', 'ష్', 'ణ' ]

Would you consider it correct?

No. v.graphemes("కృష్ణ") should return (2) ["కృ", "ష్ణ"]

I started using nota/split-graphemes. This issue is fixed in that library - nota/split-graphemes#5

Voca supports a limited number of graphemes to keep the library size under minimum. Use an alternative library that has full support.