GreycLab/CImg

Non-ANSI file name fails to fopen on Windows with latin code page.

Closed this issue · 11 comments

When opening an image file with non-ANSI file name (e.g. opening a Chinese file name 测试.jpg on English version Windows ), the multibyte version Windows API fopen would fail to find and open that file. After some investigation, I found out that the only solution is to use the wide char version Windows API _wfopen instead.

The following patch works for me on VS2015 + Win10. Note that this patch assumes the CImg::load() interface always uses the UTF-8 encoding file name, despite of the current code page of the Windows system. This behavior is better than the current implementation which makes the file name input depends on the current code page and cannot express any Unicode BMP characters that is not the in the current code page.

--- a/CImg.h
+++ b/CImg.h
@@ -165,6 +165,9 @@
 #define WIN32_LEAN_AND_MEAN
 #endif
 #include <windows.h>
+#include <locale>
+#include <codecvt>
+#include <string>
 #ifndef _WIN32_IE
 #define _WIN32_IE 0x0400
 #endif
@@ -5037,7 +5040,16 @@ namespace cimg_library_suffixed {
           if (_setmode(_fileno(res),0x8000)==-1) res = 0;
         }
 #endif
-      } else res = std::fopen(path,mode);
+      } else {
+#if cimg_OS==2
+          std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
+          std::wstring wpath = converter.from_bytes(path);
+          std::wstring wmode = converter.from_bytes(mode);
+          res = _wfopen(wpath.data(),wmode.data());
+#else
+          res = std::fopen(path,mode);
+#endif
+      }
       if (!res) throw CImgIOException("cimg::fopen(): Failed to open file '%s' with mode '%s'.",
                                       path,mode);
       return res;

Hello Harry,
Does this mean this will happen for all CImg methods using std::fopen() and cimg::fopen() ?
If so, wouldn't it be wise to add a variant of fopen() that does that job always (on Windows), and invoke it from those methods ? Maybe we could add a new cimg::_fopen() function to do so ?

Thanks for reporting!

Yes. All methods using std::fopen() and cimg::fopen() will suffer from this issue on Windows. I'm sorry that I didn't go through the code fully. And I was assuming all std::fopen() calls are wrapped in cimg::fopen(). Right, an additional wrapper for std::fopen() is needed on Windows. Maybe we can make thing like:

#if cimg_OS==2
#define cimg_fopen cimg::fopen_win
#else
#define cimg_fopen std::fopen
#endif

#if cimg_OS==2
FILE *fopen_win(const char *const path, const char *const mode) {
         std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
         std::wstring wpath = converter.from_bytes(path);
         std::wstring wmode = converter.from_bytes(mode);
         return _wfopen(wpath.data(),wmode.data());
}
#endif

And then replace every occurrence of std::fopen() to cimg_fopen()

Hello Harry.
cimg::fopen() is used in a particular context : when we want to throw an exception in case the specified filename is not found, which is a bit different than when std::fopen() is used in CImg.h.
That's why sometimes I use one, or the other.

So what I did is adding a new function cimg::win_fopen() that is basically the same as the one you suggested, and defined a macro std_fopen whose value is either std::fopen or cimg::win_fopen() whether it is compiled on Windows or not.

Commit : 01d16f5
integrates all the necessary changes. Could you check if that solves your problem ?
Thanks !

Confirmed. This commit solves the problem for me. (tested on VS2013 and VS2015)
Thanks!

I think it should be all right to close this issue for now.

The new code for fopen() on windows does not compile with g++ 4.9.2 on Windows (using MinGW), making all the library broken for CImg users using this platform. :(
The error is

fatal error: codecvt: No such file or directory

Any idea how to do the same with only more classical Win32 function calls ?

I've rewritten the code of cimg::win_fopen() so that it uses only the WIN32 API, and does not require additional headers to be included (commit 62594bc).
So far, that seems to work correctly on Windows.

Update: the commit works after I change CP_ACP to CP_UTF8.

Damn Windows!!


Commit 62594bc does not work for me using MSVC compiler in English system and Chinese file name.

Windows always sucks when solving encoding problems. The most basic and biggest issue is that: WIN32 API never supports UTF-8 encoding multibyte strings. Therefore, there's always a code page option for all multibyte APIs (SetFileApisToANSI(), SetFileApisToOEM(), and _l version APIs specifying locale). If there are any characters outside of the current OEM codepage, the APIs would fail.

The problem of this commit is that, The file name could containing characters that is not included in the current OEM code page. For example, if the OEM code page of the system is set to CP1252 (Latin 1), and there is a file name containing characters in CP936 (GB18030) which is not included in CP1252, std::fopen() multibyte API would always fail to open that file, despite of what encoding you provided as file name argument [1]. In this commit, the first argument of MultiByteToWideChar() is set to CP_ACP, which would make the conversion fail if there are any character outside of ANSI char range.

In my opinion, the best solution should be using UTF-8 encoded string all the time, although the WIN32 API does not support this. Besides, I guess MinGW should also suffer from the encoding problem on Windows since it also invokes WIN32 API. (Maybe I will find some time to test this.)

So the only remaining problem is how to convert a UTF-8 string to wide char string on Windows. From the comment of [2] , it seems that it's still impossible for MinGW compiler to do this within standard library...

Reference:
[1] http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
[2] http://stackoverflow.com/questions/2573834/c-convert-string-or-char-to-wstring-or-wchar-t

OK, good to know.
So, can we consider commit 4d60738 fixes the issue ?

Yes. commit 4d60738 works perfectly on my side. As long as it does not block MinGW compilation, we can consider this issue as solved.

Nice ! Thanks again for your report and your tests. I've appreciated it.