Can't parse UTF-16 surrogate sequences (chained \u... escapes)
smheidrich opened this issue · 1 comments
smheidrich commented
See daggaz/json-stream#39 (comment)
The underlying issue is that the JSON standard mandates that if one decides to escape Unicode code points beyond U+FFFF, it should be done using UTF-16 surrogate pair sequences, but json-stream's Rust tokenizer naively tries to parse each \u... escape into a Rust char separately, and Rust chars specifically can't represent Unicode surrogate code points. So that has to be fixed.
smheidrich commented
Fixed in #58