gobo-eiffel/gobo

YY_UNICODE_BUFFER does not handle non-ASCII strings

Closed this issue · 0 comments

If there are non-ASCII characters when initializing YY_UNICODE_BUFFER by calling make_from_utf8_string, the buffer ends up with extraneous null characters at the end that can be seen as incorrectly set count. Indeed, the count is taken from the original string that is longer than the corresponding Unicode version. When using such a buffer in a scanner, the latter reports an unknown token because it attempts to read and interpret the extraneous null characters as if they were present in the source string.

My temporary solution is to add the following assignments at the end of the compound of make_from_utf8_string:

			count := nb

I did not check whether this is an absolutely correct fix, but it worked for me. Also, there could be other places where count is set incorrectly when filling the buffer with UTF-encoded data.