jpt13653903/ALCHA

Invalid space characters

Closed this issue · 1 comments

The current scanner assumes that U+200C and U+200D are spaces, where they are actually joiners and should not be handled as spaces.

The Unicode standard specifically states that U+2060 must be ignored for "word segmentation".

Similar with U+180E. The standard states that "MVS is not a suffix but an integral part of
the word stem"

Updated the EBNF, but nothing in the code yet