Mixing UTF-8 and ANSI on Windows

Question

Mixing UTF-8 and ANSI on Windows

Closed this issue 3 years ago · 2 comments

Hi, just letting you know, that you are mixing ANSI API, which uses current system code page, and UTF-8 (converting to/from wide strings with MultiByteToWideChar/UTF-8), which is not really a good idea.
When using MultiByteToWideChar, I suggest you also set a flag to fail on invalid UTF-8 instead of having it silently replaced by '?' character and then using wrong paths/names.
Hard casting string sizes from size_t to int is also unsafe. You should at least have an assetion there that it doesn't overflow INT_MAX. I know, who sane would pass a string that large ... but its possible attack vector on applications consuming your library.

Answer 1 · 2021-05-14T16:20:17.000Z

Hi, I took a look for everything you said.
-I don't see where I use Ansi (or automatic Ansi/Wide) functions, I always use the Wide version for functions that have an Ansi alternative. All public interfaces expose UTF-8, and the Windows version convert from/to wide when required. I don't no if it was what you meant but that's how i understand it.
-Yes I should, I didn't know about this, I thought it would already fail in case the UTF-8 sequence was broken.
-2 Gio paths is clearly unreal, no one is gonna do that, but I can add to the contract that paths should not exceed 2 billions octets, and in fact less that that since most Win32 functions are limited to 32,767 characters (for Wide versions), and lots of other filesystems in posix environments have even lower limits (varies depending on FS format).
Thanks you for reporting this.

Answer 2 · 2021-05-14T17:08:54.000Z

named_mutex.hpp, using FormatMessageA. Well its just an error message but ... To be honest, I only looked into this file and see it there and I was expecting this mixing beeing somewhere else as well. Anyway, thank you for sharing your code with us and don't get discouraged by people like me who keep pinpointing small unimportant stuff while not actually writing single line of code. Chears.