clarinsi/Obeliks4J

Bad treatment of XML entities

Opened this issue · 0 comments

While the tokeniser would not necessarily grok XML entities, it should at least treat them consistently.
Currently, '>' is tokenised to one token, but '&' to three tokens, which causes problems.