Missing UTF8 Encode/Decode in std::filesystem ScanDirectory() ?
m9710797 opened this issue · 8 comments
Hi,
I think there may be a bug in the std::filesystem
version of the ScanDirectory()
function.
When I look at the surrounding code (e.g. CreateDirectoryIfNotExist()
) then it:
- converts the
std::string
parameter tostd::wstring
- uses that
wstring
to construct astd::filesystem::path
But this conversion is missing in ScanDirectory()
(possibly the same problem exits in other functions like ParsePathFileName()
).
(And obviously, after decoding utf8->wstring, the result should be re-encoded from wstring->utf8).
At least this is what I think the problem is. We got a bug report from a windows user that only part of his files showed up in the file dialog. After some investigation it seemed that the listing stopped on a file which contained a U+0107 character.
I don't have access to a windows machine myself, so I couldn't really debug in detail.
Thanks for looking into this.
Wouter
Hello,
The conversion is not needed when using the filesystem api since utf8 is supported.
When we are using it, i due to the microsoft apis (like GetVolumeInformationW) who is using widestrings instead of utf8 strings.
i tested with the sample app with std::filesystem fs and windows, and i cant reproduce your issue.
maybe you just forgot to define locales in your app (who is important on windows)
you can try the Demo App (see the DemoApp branch)
Hi aiekick,
Thanks for your reply. But I have some follow-up questions/remarks.
The conversion is not needed when using the filesystem api since utf8 is supported. When we are using it, i due to the microsoft apis (like GetVolumeInformationW) who is using widestrings instead of utf8 strings.
Actually the conversion is needed (on Windows). You cannot simply pass a UTF-8 encoded std::string
to the constructor of std::filesystem::path
. See here for a clear demonstration of the problem. This example constructs a path
from a plain string
and then prints it out (see the result at the bottom of the page). Notice how this results in the wrong output for the plain string
version.
The fs::path
documentation tells the same.
In ImGuiFileDialog.cpp
line 411, in the function IsDirectoryCanBeOpened()
you do already correctly initialize the fs::path
(by going via a wstring
). But in ImGuiFileDialog.cpp
line 525, in the function ScanDirectory()
this conversion is forgotten (it's also missing on line 509, maybe there are still more places?).
Notice that on line 549 auto fileNameExt = file.path().filename().string();
you also need to do the conversion in the other direction (on windows query a wstring
and convert that to UTF-8). (Alternatively you could query a u8string
, but that API changed between C++17 and C++20, maybe that's too soon to use, I don't know which C++ version you currently require in ImGuiFileDialog).
i tested with the sample app with std::filesystem fs and windows, and i cant reproduce your issue.
My guess: could it be you (only) tested with characters that happened to be included in your current windows code page setting? (Other people may have selected a different code page). May I suggest to also test with some more exotic characters?
maybe you just forgot to define locales in your app (who is important on windows)
You're right that setting a different locale influences how constructing a fs::path
from a plain string
behaves. But that's because windows does NOT interpret such a string as UTF-8. The point of using UTF-8 in APIs is that they become independent of locales settings.
Thanks.
My guess: could it be you (only) tested with characters that happened to be included in your current windows code page setting? (Other people may have selected a different code page). May I suggest to also test with some more exotic characters?
indeed :)
You're right that setting a different locale influences how constructing a fs::path from a plain string behaves. But that's because windows does NOT interpret such a string as UTF-8. The point of using UTF-8 in APIs is that they become independent of locales settings.
if i not do that :
std::setlocale(LC_ALL, ".UTF8");
at start, the special chars are not shown, so utf8 on windows is not independent of locales settings.
so ok, maybe there is a problem.
there is many old github issues regarding this unicode problem on windows.
was fixed for dirent as i know, but maybe not for std.
btw its difficult to test it, since the current font, not handle many of "exotic" chars.
will try to solve that, in a reasonable time. but your help is welcome, if you can propose a pull request.
I would like to help, but it's not easy for me because I don't have access to a windows machine.
I could try to make a PR, but I can't test it on windows, not even test whether my changes compile.
But let me know if you think such a PR would be useful nevertheless, then I'll make an attempt.
yes please create this pr.
i can test it.
im configuring the demo app for working with chinese fonts who can fisplay thoses extra range char map.
I create a PR. But as I said: I was not able to test or even compile this on Windows myself.
i understand thanks. will test that