thejoshwolfe/yauzl

Can't unzip .zip files created by OS X 10.12's native zipper

coolaj86 opened this issue · 10 comments

Error

On Mac OS X 10.12.6 the native zip utility (right-click "compress folder") creates a zip file that gives this error:

extra field length exceeds extra field buffer size

Not that I don't believe it's possible for Apple to make a mistake, but my guess is that the underlying zip utility is probably the same Unix / Darwin zip that has been in use for a very long time and that it's statistically more likely to be a problem with this library.

For reference, here is the failing zip file (just a couple of test files for a website):

Example Zip File

I've attached fails-to-parse.zip for the test case.

The contents are as follows:

.
├── css
│   ├── custom.css
│   ├── plugins
│   │   └── bootstrap.css.min
│   └── styles.css
├── index-three.html
├── index-two.html
├── index.html
└── js
    ├── app.js
    ├── config.js
    └── custom.js

I have also zipped files that do unzip just fine. My guess is that there's some sort of off-by-one byte alignment issue.

Test Case

And a test case (based on the example):

mkdir -p /tmp/yauzl-test
pushd /tmp/yauzl-test
npm install yauzl@latest
touch test.js
wget -c https://github.com/thejoshwolfe/yauzl/files/1420073/fails-to-parse.zip
# => copy code below into test.js <= #
node test.js

test.js:

'use strict';

var yauzl = require('yauzl');

yauzl.open('./fails-to-parse.zip', function (err, zipfile) {
  if (err) throw err;

  zipfile.readEntry();
  zipfile.on('entry', function (entry) {
    if (/\/$/.test(entry.fileName)) {
      zipfile.readEntry();
      return;
    }

    zipfile.openReadStream(entry, function(err, readStream) {
      if (err) throw err;

      readStream.on("end", function() {
        zipfile.readEntry();
      });

      readStream.on("data", function (data) {
        console.log("data of length", data.length);
      });
    });
  });
  zipfile.on('error', function (err) {
    throw err;
  });
  zipfile.on('end', function () {
    console.log("all entries read and processed");
  });
});

Thanks for the report. i'll take a look and get back to you.

I'm getting this same error on a file created with command line zip on macOS.

Tried zipping using BetterZip. Different error:

Error: unexpected EOF

So nothing works for me yet. The weird thing is, this error is being thrown and it's not coming through the callback. AFAIK, zipfile.on('entry', ...) callback function only has entry as an argument, no error object.

Side note: is there no easy way to just unzip and write the files to a folder? Without having to mess with streams, readEntry, etc?

I tried with a few of the other node.js libs namely jszip, unzipper (and improved fork of unzip), and node-stream-zip and they work.

unzipper and node-stream-zip seem to be the best for me.

If you don't want streaming (i.e. an entire 1gb file would have to load into memory) then adm-zip is probably the most loved and respected. If you need browser support too then jszip may be the right fit.

I went with one called decompress I believe. Simple, works well. I was on a deadline, so couldn't wait to hear back on this issue.

I am unable to reproduce your issue. The zipfile you linked seems to parse just fine for me. I can explain what that particular error means, but it doesn't seem to be relevant to your zipfile.

Also, I believe decompress is implemented by using yauzl behind the scenes, so I'm not sure how switching to that library resolved this issue for you. Perhaps there are different versions of yauzl at play here.

I tried with a few of the other node.js libs ...

jszip - i didn't find much API documentation, but it looks like it operates on ram buffers, which means it's unsuitable for large zipfiles.

unzipper and node-stream-zip attempt to parse zipfiles from start to finish, which means they don't support some well-formed zipfiles, specifically those using general purpose bit 3. see No Streaming Unzip API in yauzl's readme.

Here's a fairly harmless looking way to create a zipfile using Info-ZIP (which is the zip command line program in most unix-like environments), which cannot be parsed with a start-to-finish streaming parser:

$ echo -ne '\x50\x4b\x07\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > file.txt
$ zip -q0 - file.txt | cat > out.zip

My experimentation confirms my theory that unzipper cannot parse that zipfile.

Here's me showing it fail in about 72 seconds: https://youtu.be/0rKj9TZkPD4

Here are my versions:

yauzl-test@1.0.0 /private/tmp/yauzl-test
└─┬ yauzl@2.8.0
  ├── buffer-crc32@0.2.13
  └─┬ fd-slicer@1.0.1
    └── pend@1.2.0

node --version
7.8.0

npm --version
4.2.0

Just updated to latest node, npm, and retried after rm -rf node_modules.

aj@bowie /p/t/yauzl-test> npm install --save yauzl
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN yauzl-test@1.0.0 No description
npm WARN yauzl-test@1.0.0 No repository field.

+ yauzl@2.8.0
added 4 packages in 4.303s
aj@bowie /p/t/yauzl-test> node test.js
/private/tmp/yauzl-test/test.js:28
    throw err;
    ^

Error: extra field length exceeds extra field buffer size
    at /private/tmp/yauzl-test/node_modules/yauzl/index.js:305:83
    at /private/tmp/yauzl-test/node_modules/yauzl/index.js:618:5
    at /private/tmp/yauzl-test/node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:660:17)
aj@bowie /p/t/yauzl-test> node --version
v8.8.1

aj@bowie /p/t/yauzl-test> npm --version
5.4.2

I also tried rm -rf node_modules package-lock.json; npm install --save yauzl to make sure I got the latest versions.

Ah. Sorry about that. I was so focused on the zip file that I completely neglected to look at your sample code. My bad.

I am able to reproduce the issue now. The video helped!

The problem is that you're using both lazyEntries:false and readEntry(). You probably forgot to put {lazyEntries:true} in your yauzl.open() call.

This is such a easy mistake to make that I'm adding a check for it to throw an exception rather than this "undefined behavior". This change should make this pitfall much more friendly to navigate. Here's an example of what you get with this change:

$ node asdf.js 
/home/josh/dev/yauzl/index.js:231
  if (!this.lazyEntries) throw new Error("readEntry() called without lazyEntries:true");
                         ^

Error: readEntry() called without lazyEntries:true
    at ZipFile.readEntry (/home/josh/dev/yauzl/index.js:231:32)
    at /home/josh/dev/yauzl/asdf.js:8:11
    at /home/josh/dev/yauzl/index.js:36:7
    at /home/josh/dev/yauzl/index.js:137:16
    at /home/josh/dev/yauzl/index.js:623:5
    at /home/josh/dev/yauzl/node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:629:17)

And just for reference, the {lazyEntries:true} way is the recommended way, and the only reason it's not the default is to maintain compatibility with earlier versions of yauzl.

Thank you very much. I can confirm that several files that threw errors before now parse as expected...

I'm not sure how I didn't add the lazyEntries: true before, but I'm just grateful for your help.