pmqs/zipdetails

Error decoding `Xceed Unicode extra field` (0x0x554e)

rougemeilland opened this issue · 7 comments

1. Overview

Although I haven't actually run it, I found an error in the way his subroutine decode_Xceed_unicode in the script zipdetails was handled, so I would like to report it.

2. Extra field 0x554e format

As a result of experiments and analysis, it appears that the format of the extra field 0x0x554e is as follows.

2,1 For central directory headers

offset (bytes) length (bytes) value
0 4 signature (0x5843554e)
4 2 Half the number of bytes in the entry name encoded in UTF-16
6 2 Half the number of bytes in the comment encoded in UTF-16
8 (4th byte value) * 2 byte array of entry name encoded in UTF-16
8 + (4th byte value) * 2 (6th byte value) * 2 Byte array of comment encoded in UTF-16

2.2 For local headers

offset (bytes) length (bytes) value
0 4 signature (0x5843554e)
4 2 Half the number of bytes in the entry name encoded in UTF-16
6 (4th byte value) * 2 byte array of entry name encoded in UTF-16

3. About the contents of script zipdetails

The comment of the subroutine decode_Xceed_unicode in the script zipdetails/bin/zipdetails says Found the Null prefix.
My guess is that the reason why there appears to be a 2-byte null at the beginning of the entry name is probably because the comment length field in the 8th byte in "2.1 For central directory headers" happens to be 0.

I don't understand the perl language very well, so I don't know how to fix it. sorry.

pmqs commented

Hey @rougemeilland

thanks for the very detailed feedback.

The issues I had with reverse engineering this field were

  1. There is no public definition for this field that I can find.
  2. I could only find one example program that used this field -- https://www.telerik.com/fiddler . It didn't add a comment.

I suspect your analysis is spot on.

Do you have any small example zip files that fill out the comment field that I could add to my test harness? I can easily create an example by hand, but I'd prefer a file actually generated by the Xceed library code.

Any chance you know anything about the other Xceed extra field 0x4f4c Xceed original location extra field?

I created a ZIP file with various extra fields added.
I examined the ZIP file in a binary editor and the format of the extra field 0x554e appears to be as reported the other day.
I will attach the ZIP file I created here with the name test.zip.

And I show below the test code that created that ZIP file. This code uses the Xceed ZIP library.

using System;
using System.IO;
using System.Text;
using Xceed.Compression;
using Xceed.FileSystem;
using Xceed.Zip;

namespace Experiment001
{
    internal class Program
    {
        // Create a test ZIP file. Specify the path name of the ZIP file in the first command argument.
        static void Main(string[] args)
        {
            var localEncoding = Encoding.GetEncoding("shift_jis");

            ZipArchive.DefaultExtraHeaders = ExtraHeaders.None;
            ZipArchive.OEMEncodingOverride = localEncoding;

            var zipArchiveFile = new DiskFile(args[0]);
            var zipArchive = new ZipArchive(zipArchiveFile)
            {
                DefaultCompressionMethod = CompressionMethod.Deflated,
                DefaultCompressionLevel = CompressionLevel.Highest,
                DefaultTextEncoding = TextEncoding.Unicode, // Set bit 11 of the general purpose flag.
                DefaultUnicodeUsagePolicy = UnicodeUsagePolicy.Always, // Set bit 11 of the general purpose flag even if the entry name and comment are only in the ASCII character set.
            };
            zipArchive.Comment = $"This is comment for zip archive file '{zipArchive.HostFile.Name}'.";

            zipArchive.BeginUpdate();
            try
            {
                {
                    var file = (ZippedFile)zipArchive.CreateFile($"file_0x554e.txt", true);
                    file.ExtraHeaders = ExtraHeaders.Unicode; // Add extra field 0x554e to this entry.
                    file.Comment = $"This is comment for file '{file.FullName}'.";
                    using (var outputStream = file.OpenWrite(true))
                    using (var writer = new StreamWriter(outputStream))
                    {
                        writer.WriteLine($"Hello, this file is '{file.FullName}'.");
                        writer.WriteLine($"こんにちは、このファイルは '{file.FullName}' です。");
                    }
                }

                {
                    var file = (ZippedFile)zipArchive.CreateFile($"file_0x5455.txt", true);
                    file.ExtraHeaders = ExtraHeaders.ExtendedTimeStamp; // Add extra field 0x5455 to this entry.
                    file.Comment = $"This is comment for file '{file.FullName}'.";
                    using (var outputStream = file.OpenWrite(true))
                    using (var writer = new StreamWriter(outputStream))
                    {
                        writer.WriteLine($"Hello, this file is '{file.FullName}'.");
                        writer.WriteLine($"こんにちは、このファイルは '{file.FullName}' です。");
                    }
                }

                {
                    var file = (ZippedFile)zipArchive.CreateFile($"file_0x000a.txt", true);
                    file.ExtraHeaders = ExtraHeaders.FileTimes; // Add extra field 0x000a to this entry.
                    file.Comment = $"This is comment for file '{file.FullName}'.";
                    using (var outputStream = file.OpenWrite(true))
                    using (var writer = new StreamWriter(outputStream))
                    {
                        writer.WriteLine($"Hello, this file is '{file.FullName}'.");
                        writer.WriteLine($"こんにちは、このファイルは '{file.FullName}' です。");
                    }
                }
            }
            finally
            {
                zipArchive.EndUpdate();
            }
        }
    }
}

By the way, I have done some research on the .NET version of the Xceed ZIP library.
The Xceed ZIP library seems to only support the following types of extra fields:

  • 0x001 (ZIP64 extended information extra field)
  • 0x000a (NTFS (Win9x/WinNT FileTimes))
  • 0x5455 (extended timestamp)
  • 0x554e (Xceed unicode extra field)
  • 0x6375 (Info-ZIP Unicode Comment)
  • 0x7075 (Info-ZIP Unicode Path)
  • 0x9901 (WinZip AES encryption)

Unfortunately, I couldn't find any code that deals with the extra field 0x4f4c (Xceed original location extra field).
So the format of the extra field 0x4f4c is unknown to me.
My guess is that it may have already been abolished.

pmqs commented

Thanks @rougemeilland !

below is the output from a tentative fix I've just applied to zipdetails. The fix is in the develop branch if you want to try it out. See https://github.com/pmqs/zipdetails/blob/develop/bin/zipdetails

Don't see any mention of the 4f4c (Xceed original location extra field) in the Xceed documentation (here), but note that the 554E field is now deprecated. Below is from the Xceed documentation

As of version 6.5, the xehUnicode extra header is deprecated. It should only be used if you need process zip files using old versions of Xceed Zip Compression Library and/or Zip .NET v4.0 and below.

If you have the time could you create another zip file that has 8-bit filenames & comments and also uses the 554E field? I'd expect to see the filename/comments in the official locations encoded in UTF8, and in UTF16 in the Xceed Unicode field. Won't know for sure until we try.

Hmm, Just looking at the Xceed documentation I see there are a couple of ExtraHeaders called xehUTF8Filename and xehUTF8Comment that seem to control UTF8 encoding. I'm guessing you would need them to get the filenames/comments in UTF8.

I'd do this myself but the Xceed library appears to be a commercial product that I'm not prepared to pay for.

0000 0003 0004 50 4B 03 04 LOCAL HEADER #1       04034B50 (67324752)
0004 0004 0001 14          Extract Zip Spec      14 (20) '2.0'
0005 0005 0001 00          Extract OS            00 (0) 'MS-DOS'
0006 0007 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
0008 0009 0002 08 00       Compression Method    0008 (8) 'Deflated'
000A 000D 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
000E 0011 0004 F9 DA D3 CC CRC                   CCD3DAF9 (3436436217)
0012 0015 0004 5C 00 00 00 Compressed Size       0000005C (92)
0016 0019 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
001A 001B 0002 0F 00       Filename Length       000F (15)
001C 001D 0002 28 00       Extra Length          0028 (40)
001E 002C 000F 66 69 6C 65 Filename              'file_0x554e.txt'
               5F 30 78 35
               35 34 65 2E
               74 78 74
002D 002E 0002 4E 55       Extra ID #1           554E (21838) 'Xceed unicode extra field [UN]'
002F 0030 0002 24 00         Length              0024 (36)
0031 0034 0004 4E 55 43 58   ID                  5843554E (1480807758)
0035 0036 0002 0F 00         Filename Length     000F (15)
0037 0054 001E 66 00 69 00   UTF16LE Filename    'file_0x554e.txt'
               6C 00 65 00
               5F 00 30 00
               78 00 35 00
               35 00 34 00
               65 00 2E 00
               74 00 78 00
               74 00
0055 00B0 005C F3 48 CD C9 PAYLOAD               .H....Q(..,VH..IU...1 V.A...I.^IE.../......&?n\..q.......A"...7O}...q....@..X4+<n\..q...&^..
               C9 D7 51 28
               C9 C8 2C 56
               48 CB CC 49
               55 00 D2 EA
               31 20 56 BC
               41 85 A9 A9
               49 AA 5E 49
               45 89 BA 1E
               2F D7 E3 C6
               C9 8F 9B 26
               3F 6E 5C FD
               B8 71 E1 E3
               C6 F5 8F 1B
               1A 41 22 8D
               EB 1E 37 4F
               7D DC B4 F0
               71 D3 92 C7
               CD 40 A9 F5
               58 34 2B 3C
               6E 5C FE B8
               71 E6 E3 86
               26 5E 2E 00

00B1 00B4 0004 50 4B 03 04 LOCAL HEADER #2       04034B50 (67324752)
00B5 00B5 0001 14          Extract Zip Spec      14 (20) '2.0'
00B6 00B6 0001 00          Extract OS            00 (0) 'MS-DOS'
00B7 00B8 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
00B9 00BA 0002 08 00       Compression Method    0008 (8) 'Deflated'
00BB 00BE 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
00BF 00C2 0004 76 F2 9B B6 CRC                   B69BF276 (3063673462)
00C3 00C6 0004 5C 00 00 00 Compressed Size       0000005C (92)
00C7 00CA 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
00CB 00CC 0002 0F 00       Filename Length       000F (15)
00CD 00CE 0002 11 00       Extra Length          0011 (17)
00CF 00DD 000F 66 69 6C 65 Filename              'file_0x5455.txt'
               5F 30 78 35
               34 35 35 2E
               74 78 74
00DE 00DF 0002 55 54       Extra ID #1           5455 (21589) 'Extended Timestamp [UT]'
00E0 00E1 0002 0D 00         Length              000D (13)
00E2 00E2 0001 07            Flags               07 (7) 'mod access change'
00E3 00E6 0004 5B FF 61 65   Mod Time            6561FF5B (1700921179) 'Sat Nov 25 14:06:19 2023'
00E7 00EA 0004 5B FF 61 65   Access Time         6561FF5B (1700921179) 'Sat Nov 25 14:06:19 2023'
00EB 00EE 0004 5B FF 61 65   Change Time         6561FF5B (1700921179) 'Sat Nov 25 14:06:19 2023'
00EF 014A 005C F3 48 CD C9 PAYLOAD               .H....Q(..,VH..IU...1 V.A.....^IE.../......&?n\..q.......A"...7O}...q....@..X4+<n\..q...&^..
               C9 D7 51 28
               C9 C8 2C 56
               48 CB CC 49
               55 00 D2 EA
               31 20 56 BC
               41 85 A9 89
               A9 A9 5E 49
               45 89 BA 1E
               2F D7 E3 C6
               C9 8F 9B 26
               3F 6E 5C FD
               B8 71 E1 E3
               C6 F5 8F 1B
               1A 41 22 8D
               EB 1E 37 4F
               7D DC B4 F0
               71 D3 92 C7
               CD 40 A9 F5
               58 34 2B 3C
               6E 5C FE B8
               71 E6 E3 86
               26 5E 2E 00

014B 014E 0004 50 4B 03 04 LOCAL HEADER #3       04034B50 (67324752)
014F 014F 0001 14          Extract Zip Spec      14 (20) '2.0'
0150 0150 0001 00          Extract OS            00 (0) 'MS-DOS'
0151 0152 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
0153 0154 0002 08 00       Compression Method    0008 (8) 'Deflated'
0155 0158 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
0159 015C 0004 5B 4C 03 57 CRC                   57034C5B (1459833947)
015D 0160 0004 5C 00 00 00 Compressed Size       0000005C (92)
0161 0164 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
0165 0166 0002 0F 00       Filename Length       000F (15)
0167 0168 0002 00 00       Extra Length          0000 (0)
0169 0177 000F 66 69 6C 65 Filename              'file_0x000a.txt'
               5F 30 78 30
               30 30 61 2E
               74 78 74
0178 01D3 005C F3 48 CD C9 PAYLOAD               .H....Q(..,VH..IU...1 V.A...A.^IE.../......&?n\..q.......A"...7O}...q....@..X4+<n\..q...&^..
               C9 D7 51 28
               C9 C8 2C 56
               48 CB CC 49
               55 00 D2 EA
               31 20 56 BC
               41 85 81 81
               41 A2 5E 49
               45 89 BA 1E
               2F D7 E3 C6
               C9 8F 9B 26
               3F 6E 5C FD
               B8 71 E1 E3
               C6 F5 8F 1B
               1A 41 22 8D
               EB 1E 37 4F
               7D DC B4 F0
               71 D3 92 C7
               CD 40 A9 F5
               58 34 2B 3C
               6E 5C FE B8
               71 E6 E3 86
               26 5E 2E 00

01D4 01D7 0004 50 4B 01 02 CENTRAL HEADER #1     02014B50 (33639248)
01D8 01D8 0001 2D          Created Zip Spec      2D (45) '4.5'
01D9 01D9 0001 00          Created OS            00 (0) 'MS-DOS'
01DA 01DA 0001 14          Extract Zip Spec      14 (20) '2.0'
01DB 01DB 0001 00          Extract OS            00 (0) 'MS-DOS'
01DC 01DD 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
01DE 01DF 0002 08 00       Compression Method    0008 (8) 'Deflated'
01E0 01E3 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
01E4 01E7 0004 5B 4C 03 57 CRC                   57034C5B (1459833947)
01E8 01EB 0004 5C 00 00 00 Compressed Size       0000005C (92)
01EC 01EF 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
01F0 01F1 0002 0F 00       Filename Length       000F (15)
01F2 01F3 0002 24 00       Extra Length          0024 (36)
01F4 01F5 0002 2C 00       Comment Length        002C (44)
01F6 01F7 0002 00 00       Disk Start            0000 (0)
01F8 01F9 0002 00 00       Int File Attributes   0000 (0)
                           [Bit 0]               0 'Binary Data'
01FA 01FD 0004 00 00 00 00 Ext File Attributes   00000000 (0)
01FE 0201 0004 4B 01 00 00 Local Header Offset   0000014B (331)
0202 0210 000F 66 69 6C 65 Filename              'file_0x000a.txt'
               5F 30 78 30
               30 30 61 2E
               74 78 74
0211 0212 0002 0A 00       Extra ID #1           000A (10) 'NTFS FileTimes'
0213 0214 0002 20 00         Length              0020 (32)
0215 0218 0004 00 00 00 00   Reserved            00000000 (0)
0219 021A 0002 01 00         Tag1                0001 (1)
021B 021C 0002 18 00         Size1               0018 (24)
021D 0224 0008 E7 56 5B 90   Mtime               01DA1FA8905B56E7 (133453947797722855) 'Sat Nov 25 14:06:19 2023 772285500ns'
               A8 1F DA 01
0225 022C 0008 E7 56 5B 90   Atime               01DA1FA8905B56E7 (133453947797722855) 'Sat Nov 25 14:06:19 2023 772285500ns'
               A8 1F DA 01
022D 0234 0008 E7 56 5B 90   Ctime               01DA1FA8905B56E7 (133453947797722855) 'Sat Nov 25 14:06:19 2023 772285500ns'
               A8 1F DA 01
0235 0260 002C 54 68 69 73 Comment               'This is comment for file '\file_0x000a.txt'.'
               20 69 73 20
               63 6F 6D 6D
               65 6E 74 20
               66 6F 72 20
               66 69 6C 65
               20 27 5C 66
               69 6C 65 5F
               30 78 30 30
               30 61 2E 74
               78 74 27 2E

0261 0264 0004 50 4B 01 02 CENTRAL HEADER #2     02014B50 (33639248)
0265 0265 0001 2D          Created Zip Spec      2D (45) '4.5'
0266 0266 0001 00          Created OS            00 (0) 'MS-DOS'
0267 0267 0001 14          Extract Zip Spec      14 (20) '2.0'
0268 0268 0001 00          Extract OS            00 (0) 'MS-DOS'
0269 026A 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
026B 026C 0002 08 00       Compression Method    0008 (8) 'Deflated'
026D 0270 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
0271 0274 0004 76 F2 9B B6 CRC                   B69BF276 (3063673462)
0275 0278 0004 5C 00 00 00 Compressed Size       0000005C (92)
0279 027C 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
027D 027E 0002 0F 00       Filename Length       000F (15)
027F 0280 0002 09 00       Extra Length          0009 (9)
0281 0282 0002 2C 00       Comment Length        002C (44)
0283 0284 0002 00 00       Disk Start            0000 (0)
0285 0286 0002 00 00       Int File Attributes   0000 (0)
                           [Bit 0]               0 'Binary Data'
0287 028A 0004 00 00 00 00 Ext File Attributes   00000000 (0)
028B 028E 0004 B1 00 00 00 Local Header Offset   000000B1 (177)
028F 029D 000F 66 69 6C 65 Filename              'file_0x5455.txt'
               5F 30 78 35
               34 35 35 2E
               74 78 74
029E 029F 0002 55 54       Extra ID #1           5455 (21589) 'Extended Timestamp [UT]'
02A0 02A1 0002 05 00         Length              0005 (5)
02A2 02A2 0001 07            Flags               07 (7) 'mod access change'
02A3 02A6 0004 5B FF 61 65   Mod Time            6561FF5B (1700921179) 'Sat Nov 25 14:06:19 2023'
02A7 02D2 002C 54 68 69 73 Comment               'This is comment for file '\file_0x5455.txt'.'
               20 69 73 20
               63 6F 6D 6D
               65 6E 74 20
               66 6F 72 20
               66 69 6C 65
               20 27 5C 66
               69 6C 65 5F
               30 78 35 34
               35 35 2E 74
               78 74 27 2E

02D3 02D6 0004 50 4B 01 02 CENTRAL HEADER #3     02014B50 (33639248)
02D7 02D7 0001 2D          Created Zip Spec      2D (45) '4.5'
02D8 02D8 0001 00          Created OS            00 (0) 'MS-DOS'
02D9 02D9 0001 14          Extract Zip Spec      14 (20) '2.0'
02DA 02DA 0001 00          Extract OS            00 (0) 'MS-DOS'
02DB 02DC 0002 02 08       General Purpose Flag  0802 (2050)
                           [Bits 1-2]            1 'Maximum Compression'
                           [Bit 11]              1 'Language Encoding'
02DD 02DE 0002 08 00       Compression Method    0008 (8) 'Deflated'
02DF 02E2 0004 C9 B8 79 57 Last Mod Date/Time    5779B8C9 (1467594953) 'Sat Nov 25 23:06:18 2023'
02E3 02E6 0004 F9 DA D3 CC CRC                   CCD3DAF9 (3436436217)
02E7 02EA 0004 5C 00 00 00 Compressed Size       0000005C (92)
02EB 02EE 0004 6F 00 00 00 Uncompressed Size     0000006F (111)
02EF 02F0 0002 0F 00       Filename Length       000F (15)
02F1 02F2 0002 82 00       Extra Length          0082 (130)
02F3 02F4 0002 2C 00       Comment Length        002C (44)
02F5 02F6 0002 00 00       Disk Start            0000 (0)
02F7 02F8 0002 00 00       Int File Attributes   0000 (0)
                           [Bit 0]               0 'Binary Data'
02F9 02FC 0004 00 00 00 00 Ext File Attributes   00000000 (0)
02FD 0300 0004 00 00 00 00 Local Header Offset   00000000 (0)
0301 030F 000F 66 69 6C 65 Filename              'file_0x554e.txt'
               5F 30 78 35
               35 34 65 2E
               74 78 74
0310 0311 0002 4E 55       Extra ID #1           554E (21838) 'Xceed unicode extra field [UN]'
0312 0313 0002 7E 00         Length              007E (126)
0314 0317 0004 4E 55 43 58   ID                  5843554E (1480807758)
0318 0319 0002 0F 00         Filename Length     000F (15)
031A 031B 0002 2C 00         Comment Length      002C (44)
031C 0339 001E 66 00 69 00   UTF16LE Filename    'file_0x554e.txt'
               6C 00 65 00
               5F 00 30 00
               78 00 35 00
               35 00 34 00
               65 00 2E 00
               74 00 78 00
               74 00
033A 0391 0058 54 00 68 00   UTF16LE Comment     'This is comment for file '\file_0x554e.txt'.'
               69 00 73 00
               20 00 69 00
               73 00 20 00
               63 00 6F 00
               6D 00 6D 00
               65 00 6E 00
               74 00 20 00
               66 00 6F 00
               72 00 20 00
               66 00 69 00
               6C 00 65 00
               20 00 27 00
               5C 00 66 00
               69 00 6C 00
               65 00 5F 00
               30 00 78 00
               35 00 35 00
               34 00 65 00
               2E 00 74 00
               78 00 74 00
               27 00 2E 00
0392 03BD 002C 54 68 69 73 Comment               'This is comment for file '\file_0x554e.txt'.'
               20 69 73 20
               63 6F 6D 6D
               65 6E 74 20
               66 6F 72 20
               66 69 6C 65
               20 27 5C 66
               69 6C 65 5F
               30 78 35 35
               34 65 2E 74
               78 74 27 2E

03BE 03C1 0004 50 4B 05 06 END CENTRAL HEADER    06054B50 (101010256)
03C2 03C3 0002 00 00       Number of this disk   0000 (0)
03C4 03C5 0002 00 00       Central Dir Disk no   0000 (0)
03C6 03C7 0002 03 00       Entries in this disk  0003 (3)
03C8 03C9 0002 03 00       Total Entries         0003 (3)
03CA 03CD 0004 EA 01 00 00 Size of Central Dir   000001EA (490)
03CE 03D1 0004 D4 01 00 00 Offset to Central Dir 000001D4 (468)
03D2 03D3 0002 30 00       Comment Length        0030 (48)
03D4 0403 0030 54 68 69 73 Comment               'This is comment for zip archive file 'test.zip'.'
               20 69 73 20
               63 6F 6D 6D
               65 6E 74 20
               66 6F 72 20
               7A 69 70 20
               61 72 63 68
               69 76 65 20
               66 69 6C 65
               20 27 74 65
               73 74 2E 7A
               69 70 27 2E
#
# Done

I tried various things with the Xceed ZIP library, but the conclusion was that I could not create a ZIP file using 0x6375 (Info-ZIP Unicode Comment) and 0x7075 (Info-ZIP Unicode Path).

The Xceed ZIP library seems to behave as follows.

  • Specifying to use extra fields of 0x6375 or 0x7075 is ignored when writing.
  • Conversely, even if you do not set them to be used (or set them not to use them), those extra fields will be applied when reading.

I came to this conclusion as a result of the following experiment.

Experiments

Experiment 1

It is possible to set 0x6375 or 0x7075 as the default extra field for ZIP archives.
However, this is not reflected in the actual settings at runtime.
Even if I create a ZIP file in that state, the extra fields 0x6375 and 0x7075 are not added.

Experiment 2

I tried to set the use of 0x6375 or 0x7075 for the entire ZIP archive or for each entry at runtime, but it failed. The reason is that it is a value that cannot be set.
However, I was able to successfully set other extra fields such as 0x5455 (extended timestamp).

Experiment 3

When creating a ZIP file with WinRar, an extra field 0x7075 (Info-ZIP Unicode Path) is added.
I tried using it and performed the following steps.

  1. Create a ZIP file using WinRar.
  2. By editing the ZIP file created in step 1 with a binary editor, intentionally make the entry name in the normal header and the entry name in the extra field different.
  3. Load the ZIP file edited in 2 with WinRar and check that the name in the extra field is displayed. Also test the ZIP file and confirm that it is normal.
  4. Load the ZIP file edited in step 2 with an application using the Xceed ZIP library. As a result, the name specified by 0x7075 is read. This is true even if I do not add 0x7075 to the list of default exctra fields in the Xceed ZIP library.

WinRar doesn't seem to have the ability to set comments for entries in the first place. Therefore, I could not confirm how ZIP files with the extra field 0x6375 (Info-ZIP Unicode Comment) are handled by the Xceed ZIP library.
But I'm guessing it will probably be treated similarly to 0x7075.

Please note that I have not purchased the Xceed ZIP library yet. It is currently still in the trial period. After some time I should be unable to use it.

By the way, there is something that has been bothering me ever since I started using the trial version of the .NET version of the Xceed ZIP library. That is, the Xceed ZIP library is for .NET framework 4.
The official Xceed website states that it is compatible with .NET 7. The trial version may be an old version.

pmqs commented

Hmmm, that's strange. The two info zip fields 0x6375 and 0x7075 aren't used that much anymore but even so, I'm not sure why Xceed doesn't allow them.

The standard way to flag that the filename/comment are encoded in UTF8 these days is to just set the
Language encoding flag (EFS) -- bit 11 of the General Purpose Flag and store the UTF8 encoded filename/comment in the standard fields in the local & central headers.

Might give the Xceed trial library a go. Problem is I know nothing about C# and .NET -- Windows coding isn't my thing :-)

In conclusion, by using Xceed's product, I was able to create a ZIP file containing a mixture of the following three extra fields.

  • 0x554e (Xceed unicode extra field)
  • 0x6375 (Info-ZIP Unicode Comment)
  • 0x7075 (Info-ZIP Unicode Path)

The ZIP files created are attached to this article, so please see the following articles.

Next, the history up to that point is shown below.

I read the following specifications that you mentioned in the comment the other day.
https://doc.xceed.com/xceed-zip-for-activex/webframe.html#ExtraHeaders_property.html

I found it strange that the content was so far removed from the usage of the .NET version of the Xceed ZIP library that I had been using.
So, I looked into the Xceed product lineup again and found that in addition to the .NET version, there is an ActiveX version of the Xceed ZIP library.
There appears to be little specification compatibility between these products.

I decided to get a trial version of the ActiveX version of the Xceed ZIP library and check out the behavior of extra fields.
The results are as follows.

Extra fields I specified
on the source code
Extra field
actually added
Used ZIP file
0x554e (Xceed unicode extra field)
0x6375 (Info-ZIP Unicode Comment)
0x7075 (Info-ZIP Unicode Path)
0x554e
0x6375
0x7075
sample_all.zip
0x6375 (Info-ZIP Unicode Comment)
0x7075 (Info-ZIP Unicode Path)
0x6375
0x7075
sample_except_0x554e.zip
0x554e (Xceed unicode extra field) 0x554e
0x6375
0x7075
sample_except_0x6375_0x7075.zip
0x000a (NTFS (Win9x/WinNT FileTimes)) 0x000a
0x6375
0x7075
sample_only_0x000a.zip

As you can see in the results above, the 0x6375 and 0x7075 extra fields always seem to be appended, regardless of which extra fields I specify to append in the source code.
The other extra fields below seem to be added only if I specify that they be added.

  • 0x000a (NTFS (Win9x/WinNT FileTimes))
  • 0x4453 (Windows NT security descriptor (binary ACL))
  • 0x5455 (extended timestamp)
  • 0x554e (Xceed unicode extra field)

In the ActiveX version, it is possible to ignore specific extended fields by specifying a property called IgnoredExtraHeaders. However, even if I specify 0x6375 and 0x7075 for IgnoredExtraHeaders, they seem to be added after all.

Additionally, I was able to set the general purpose flag bit 11 by changing a property called TextEncoding.
However, even in this case, 0x6375 and 0x7075 were added automatically.
The resulting ZIP file is here.
This happens even if 0x6375 and 0x7075 are not specified in the source code.
In this case, the existence of the 0x6375 and 0x7075 extra fields has no meaning. I find this very strange.

By the way, the contents of .NET version executable files can be analyzed relatively easily, but it is not practical for anyone other than experts to analyze the contents of ActiveX version executable files (that is, general Windows DLLs).
Therefore, I cannot investigate further as to why it is implemented as described above.
I'm sorry.

pmqs commented

Thank you for the very comprehensive write up. Never mind about not getting to the bottom of why the library does what it does. There are plenty of zip libraries out there that do stranger things.

If you ever need an example zip file with specific extra fields populated this distribution has a reasonable collection in the test harness -- look in the t/file directory. I also have a larger collection of real zip files that I have collected -over the years - I use them as a torture test for this program. Please shout of you ever need something & I'll see if I have an example available.