Missing UTS 51 binary properties for Emoji
ticky opened this issue · 2 comments
Unicode Technical Standard #51 introduces emoji properties and data, which Onigmo doesn’t yet support.
For instance, one would expect this emoji character-matching regex (borrowed from Mathias Bynens’ blog to be valid, but at this point it is not;
/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/
You can use /[\u{1F1E6}-\u{1F1FF}]{2}|(?:\p{Grapheme_Cluster_Break=E_Base}|\p{Grapheme_Cluster_Break=E_Base_GAZ})\p{Grapheme_Cluster_Break=Extend}*\p{Grapheme_Cluster_Break=E_Modifier}?(?:\p{Grapheme_Cluster_Break=ZWJ}(?:\p{Grapheme_Cluster_Break=Glue_After_Zwj}|\p{Grapheme_Cluster_Break=E_Base_GAZ}\p{Grapheme_Cluster_Break=Extend}*\p{Grapheme_Cluster_Break=E_Modifier}?))*/
as UTR#51 revision 9's emoji sequence
.
That regex is not an equivalent to the regex I posted! It doesn’t correctly match characters which require U+FE0F to be displayed as emoji!