brendan-duncan/archive

zipDirectory method is failing to define filenames with accents and special characters with UTF8 encoding

Opened this issue · 8 comments

zipDirectory method is failing to define filenames with accents and special characters with UTF8 encoding

 final encoder = ZipFileEncoder();     
 encoder.zipDirectory(Directory(destinationDirectory), filename: zipFileName);
flutter ^3
  archive: ^3.3.1

image

I tested several versions and it seems that the problem persists in all that I tested

By default, zip softwares use MBCS (creator's system's codepage) for filename, not utf-8.
Java's ZipInputStream can set Charset.
Maybe a Charset/Codepage parameter should be added.

@axilesoft
when I use WinRAR or WinZip to compress the folder, everything works correctly, that is, the file names are correct when I open the zip/rar, but when I use a dart script with the lib Archive with the zipDirectory method and create the zip file it there is a problem with the filenames.

Maybe because WinRAR WinZip does not utf-8 for filename by default

I'm experiencing the same issue here:

file name in system: avó.png

Uses winrar to zip it and add the zipped file to my flutter project's assets

Executes the following algorithm:

final byteData = await rootBundle.load('assets/$path');
final Directory appDocDirNewFolder =
Directory((await getApplicationDocumentsDirectory()).path);
await appDocDirNewFolder.create(recursive: true);
final inputStream = InputStream(byteData);
final archive = ZipDecoder().decodeBuffer(inputStream);
return archive.files

Object archive.files shows name as "av¢.png'"

Any workaround for this issue?

(I'm currently using Dart SDK 2.18.1, Flutter SDK 3.3.2 and archive 3.3.1)

Further observations:

I'm using Windows (iso-8859-1 encoding) + Winrar to zip files.

I've tried using package charset_converter in order to properly decode names. It's able to get some of the names right (ações.png) but the issue with "av¢.png" fileName persists.

Next, I tried using 7zip software with -mcu parameter in order to generate a zip file forcing utf8 encoding for names. Same dart algorithm worked well.

The problem is that users won't always try to unzip files that were specifically encoded with utf8. So I was wondering if there's any way the package could be more aware of what encoding was used to properly decode.

I've noticed that, in my case, I would need cp437 + Latin1 to properly decode all the names of my files (latin1 for names like "ações.png" and cp437 for names like "almoço.png").

Maybe InputStreamBase's readString({int? size, bool utf8 = true}) method could use other methods than Utf8Decoder().convert(bytes) + String.fromCharCodes(bytes) to decode fileNames?

I can look into this, but work has been very busy so I have been and will be slow.

Is there any update on this? I experience a problem on OSX that when I add a file to the zip by running the code below, the files' name in the zip that contains accents are damaged.

final encoder = ZipFileEncoder();
encoder.create(...);

await encoder.addFile(
  file,
  'Értesítés.pdf',
);

encoder.close();

And the file in the zip is called then E��rtesi��te��s.pdf