Offset off when non-BMP characters are in the document

Question

Offset off when non-BMP characters are in the document

cabo opened this issue 2 years ago · 0 comments

I have a document with a non-BMP character in it (scalar value ≥ 0x10000), namely 🤔.
All offsets that languagetool-server gives out appear to be moved one to the right in the rest of the document.
Possibly languagetool-server indicates offsets in UTF-16 code units and not in characters.
I don't know if languagetool-server can be coaxed into counting characters.
If not, probably the document needs to be searched for non-BMP characters and corrections applied on the found ones (expensive!).