fktn-k/fkYAML

The result of parsing UTF-16 encoded input lacks the last character

Closed this issue · 1 comments

fktn-k commented

Description

When UTF-16 encoded characters which do not end with a surrogate pair are passed for deserialization, the last character is not parsed.
While digging into the issue, it's turned out this is caused due to inadequate implementation of conversion from UTF-16 to UTF-8 which tries to take care of surrogate pairs.

Reproduction steps

Try parsing any UTF-16 encoded characters which do not end with a surrogate pair, like the following:

fkyaml::node n = fkyaml::node::deserialize(u"foo: bar");
std::cout << n << std::endl;

Expected vs. actual results

Expected output:

foo: bar

Actual output:

foo: ba

Minimal code example

The same as "Reproduction steps".

Error messages

No error messages.

Compiler and operating system

GCC 11.4.0 on Ubuntu 22.04 LTS

Library version

develop HEAD

Validation

fktn-k commented

The bug reported in this issue has been fixed in #234.