ISO-2022-JP as well as some other charsets not supported on Windows
mbektchiev opened this issue · 2 comments
According to MultiByteToWideChar's docs the flags parameter has to be 0 for some code pages:
For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with
ERROR_INVALID_FLAGS
.50220
50221
50222
50225
50227
50229
57002 through 57011
65000 (UTF-7)
42 (Symbol)
Because of this the following code snippet returns an empty string instead of the expected 冬季
:
std::string res = boost::locale::conv::to_utf<char>("\xe5\x86\xac\xe5\xad\xa3", "iso-2022-jp");
My suggestion for a fix would be to check whether the code page we're dealing with is from this short-list and not add the flag if it is. This should be done in all calls to MultiByteToWideChar
like here: wconv_codepage.ipp#L100.
I just tested your example and the provided string is not valid "iso-2022-jp" encoding. ICU converts that string to an empty string too as the character is invalid. Converting 冬季 to iso-2022-jp via ICU yields "\x1b$BE_5(\x1b(B"
I assume you got the direction mixed up as the UTF-8 encoding of 冬季 is "\xe5\x86\xac\xe5\xad\xa3", so you might have wanted to use from_utf
I assume you work on windows without ICU?
Hi @Flamefire! Thanks a lot for fixing this! 🥳
I just tested your example and the provided string is not valid "iso-2022-jp" encoding. ICU converts that string to an empty string too as the character is invalid. Converting 冬季 to iso-2022-jp via ICU yields "\x1b$BE_5(\x1b(B"
I assume you got the direction mixed up as the UTF-8 encoding of 冬季 is "\xe5\x86\xac\xe5\xad\xa3", so you might have wanted to use
from_utf
.
The direction was right, but I must have wrongly copied the UTF-8 encoding... 😊
I assume you work on windows without ICU?
That's right.