contour-terminal/libunicode

Refactor grapheme cluster segmentation to properly act on clusters with more than 2 codepoints

Opened this issue · 0 comments

https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules

Specifically I am interested in correctly segmenting a consecutive list of country flags (RI regional indicators).

Also, to make the future implementation (but also the current one) very fast, we
should add the grapheme tokens (CR, LF, L, V, LV, LVT, Extend, ZWJ, Control, SpacingMark, Prepend, Extended_Pictographic, RI) as a field to the new codepoint_properties table to ensure grapheme segmentation is as efficient as possible.