Exzap/ZArchive

No enforcement of windows-1252 anywhere

Opened this issue · 1 comments

The current source code does not do any encoding conversion -- it just gets whatever the char* API puts out and puts them in the archive. As a result, the actual encoding will vary by the system's locale settings. On en-US Windows you will probably get 1252 (probably? yes, because there's a control panel switch to always use 65001), but on Chinese you might get 936, and on saner platforms that use the same encoding everywhere you would get UTF-8.

This is very different from the README's:

The encoding for paths within the archive is Windows-1252 (case-insensitive)

ZIP has a flag bit for UTF-8, but oops, there's no reserved bit in ZArchive.

Exzap commented

That line in the readme is misleading and I apologize for that. Windows-1252 isn't enforced anywhere and instead UTF-8 is actually used in practice.

Within the format specification, paths are binary blobs without any specific encoding. This is by design to let applications choose whatever encoding is convenient for them. Although the API interface expecting a null-terminated char* string already limits the possible encodings quite a lot. I don't want to open pandoras box of supporting arbitrary codecs and storing information about that in the archive itself. So let's just say that for all ZARs the path encoding is UTF-8 by default but developers are free to choose a different encoding for their applications (at the cost of incompatibility with other tools).