SOCI/soci

How to read the utf8 encoded Chinese chars string in mysql, currently I got "??" for each character

Closed this issue · 5 comments

Hi, I'm using soci under msys2's gcc.

I see the table is encoded in "UTF8", I mean I have a column like:

 `label` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,

I store some some Chinese text, it shows quite well under some mysql gui client such as Navicat or Heidisql.

Now, I try to read the text, I got a std::string, which has many "???", I see each "?" is for a single Chinese char.

When I try to print the byte value of each "?", I got "3f 3f".

So, I believe soci automatically convert the stored chars? Any ideas?

I see a similar question in this issue: can std::string of soci hold the right utf8 data? · Issue #525 · SOCI/soci

But I don't find any document of soci mention the encoding related topics.

Thanks.

Further information, if I put some text like “abcd中文" in the mysql, when I read in soci's std::string, I got "abcd??", which means the English chars are convert correctly, but not Chinese chars.

This comes another question, how to specify the encode format?

Since I'm under Windows, I guess the default format is "GB2312", and maybe soci try to use the default format to convert the byte array stored in Mysql, which is in "UTF8" format.

OK, I think I have found the solution. The solution is very simple, it suggested by AI(chatGPT), I have need to add the "charset=utf8mb4" option string when I open the soci::session.

such as:

soci::session sql(soci::mysql, "dbname=xxx user=root password=xxx charset=utf8mb4");

After that, I see that I got the correct byte array in the std::string(which is the UTF8 encoding byte array stored in the mysql).

Since I have found the solution, I think this issue can be closed, I hope it can help others.

vadz commented

I don't know what is the default charset for MySQL but it would make sense to use UTF-8 if none is specified. I don't care enough about it to do it myself however.