Support Surrogate Pairs

Question

Support Surrogate Pairs

Opened this issue 2 years ago · 0 comments

Currently, surrogate pairs are converted to UTF8 individually. This is probably not what one wants.

Proposed improvement:
when parsing an \uXXXX escape sequence in read_string, if the parsed value is the second half of a surrogate pair, check if the previous two(?) bytes in ret are the first half of a pair. If so, merge into value to get the complete code point and delete these two bytes.