Add check for overlong UTF-8 encoding
stig opened this issue · 1 comments
stig commented
Characters that can be represented with a single UTF-8 character (like ASCII) can also be represented as a multi-byte sequence, by making the leading bits all zeros. This is however considered to be an "illegal overlong encoding", c.f. https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings