k-takata/Onigmo

Missing UTS 51 binary properties for Emoji

ticky opened this issue · 2 comments

ticky commented

Unicode Technical Standard #51 introduces emoji properties and data, which Onigmo doesn’t yet support.

For instance, one would expect this emoji character-matching regex (borrowed from Mathias Bynens’ blog to be valid, but at this point it is not;

/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/
nurse commented

You can use /[\u{1F1E6}-\u{1F1FF}]{2}|(?:\p{Grapheme_Cluster_Break=E_Base}|\p{Grapheme_Cluster_Break=E_Base_GAZ})\p{Grapheme_Cluster_Break=Extend}*\p{Grapheme_Cluster_Break=E_Modifier}?(?:\p{Grapheme_Cluster_Break=ZWJ}(?:\p{Grapheme_Cluster_Break=Glue_After_Zwj}|\p{Grapheme_Cluster_Break=E_Base_GAZ}\p{Grapheme_Cluster_Break=Extend}*\p{Grapheme_Cluster_Break=E_Modifier}?))*/ as UTR#51 revision 9's emoji sequence.

ticky commented

That regex is not an equivalent to the regex I posted! It doesn’t correctly match characters which require U+FE0F to be displayed as emoji!