Invalid space characters
Closed this issue · 1 comments
jpt13653903 commented
The current scanner assumes that U+200C and U+200D are spaces, where they are actually joiners and should not be handled as spaces.
The Unicode standard specifically states that U+2060 must be ignored for "word segmentation".
Similar with U+180E. The standard states that "MVS is not a suffix but an integral part of
the word stem"
jpt13653903 commented
Updated the EBNF, but nothing in the code yet