Segmentation of combined emojis

Question

Segmentation of combined emojis

RazrFalcon opened this issue 6 years ago · 1 comments

for c in UnicodeSegmentation::graphemes("🏳️‍🌈", true) {
    println!("{}", c);
}

Outputs:

🏳️‍
🌈

🏳️‍
🌈

But should output:

🏳️‍🌈

🏳️‍🌈

Another example: 👮‍♀.

Is it UnicodeSegmentation bug or am I doing this wrong? For my current task this should be a single "character".

Answer 1 · 2018-05-21T04:18:00.000Z

We're operating off an old unicode version (9) where that's not in the tables.

https://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakProperty.txt

Filed #43

That may take a while to fix, but it may be worth updating to Unicode 10 in the interim (which is an easier update than 10 to 11), and will also fix your issue.