YY_UNICODE_BUFFER does not handle non-ASCII strings
kwaxer opened this issue · 1 comments
If there are non-ASCII characters when initializing YY_UNICODE_BUFFER
by calling make_from_utf8_string
, the buffer ends up with extraneous null characters at the end that can be seen as incorrectly set count
. Indeed, the count is taken from the original string that is longer than the corresponding Unicode version. When using such a buffer in a scanner, the latter reports an unknown token because it attempts to read and interpret the extraneous null characters as if they were present in the source string.
My temporary solution is to add the following assignments at the end of the compound of make_from_utf8_string
:
count := nb
I did not check whether this is an absolutely correct fix, but it worked for me. Also, there could be other places where count
is set incorrectly when filling the buffer with UTF-encoded data.
I moved the issue from the fork to the official project.