Offset off when non-BMP characters are in the document
cabo opened this issue · 0 comments
cabo commented
I have a document with a non-BMP character in it (scalar value ≥ 0x10000), namely 🤔.
All offsets that languagetool-server gives out appear to be moved one to the right in the rest of the document.
Possibly languagetool-server indicates offsets in UTF-16 code units and not in characters.
I don't know if languagetool-server can be coaxed into counting characters.
If not, probably the document needs to be searched for non-BMP characters and corrections applied on the found ones (expensive!).