The result of parsing UTF-16 encoded input lacks the last character
Closed this issue · 1 comments
fktn-k commented
Description
When UTF-16 encoded characters which do not end with a surrogate pair are passed for deserialization, the last character is not parsed.
While digging into the issue, it's turned out this is caused due to inadequate implementation of conversion from UTF-16 to UTF-8 which tries to take care of surrogate pairs.
Reproduction steps
Try parsing any UTF-16 encoded characters which do not end with a surrogate pair, like the following:
fkyaml::node n = fkyaml::node::deserialize(u"foo: bar");
std::cout << n << std::endl;
Expected vs. actual results
Expected output:
foo: bar
Actual output:
foo: ba
Minimal code example
The same as "Reproduction steps".
Error messages
No error messages.
Compiler and operating system
GCC 11.4.0 on Ubuntu 22.04 LTS
Library version
develop HEAD
Validation
- The bug also occurs if the latest version from the
develop
branch is used. - I can successfully compile and run the unit tests.