Slower ZIP creation after upgrade to 0.11

Question

Slower ZIP creation after upgrade to 0.11

Closed this issue 10 years ago · 39 comments

I am on a windows machine running node 0.10.28. I am using this module from grunt-contrib-compress.

To compress this folder structure

When using grunt-contrib-compress versions these are the results.
v0.12.0 took me over 4 minutes. this produces a 18913229 byte zip file
v0.11.0 took over 4 minutes and produced a zip file.
v0.10.0 took 45 seconds and produced a 18913229 byte zip file
v0.9.0 took 45 seconds and produced a 18809351 byte zip file.
v0.8.0 took 46 seconds for the same file. this produces a 18809351 byte zip file.

version 0.11 is where they updated to archiver 0.11. Version 0.10 and below are using archiver 0.9 and are ~4 times faster on my windows machine. I would assume that the regression is in there somewhere.

ctalkington commented 10 years ago

correct.

Answer 1 · 2014-09-24T19:50:48.000Z

interesting stats. does setting highWaterMark: 1024 * 1024 * 16 option help?

Answer 2 · 2014-09-24T20:00:37.000Z

in regards to archiver 0.9 vs 0.11. there has been a lot of movement with decoupling of zip output stream and more usage of file stat to support things like mode and etc. there would essentially be 42k stat calls (x2 since both compress and archiver need to run them)

Answer 3 · 2014-09-24T20:02:08.000Z

in regards to archive size, there have been a few file header fixes that led to slightly more size per file.

Answer 4 · 2014-09-24T20:05:25.000Z

I'm not too worried about file size as the archives seem to be correct. I'm mainly interested in saving 3 minutes of my life :) I'll modify the highWaterMark value if you are curious to the results. What file has this value? As far as the stat calls go, I think thats all over my head.

Answer 5 · 2014-09-24T20:07:12.000Z

your gruntfile options for task. stat calls are just a possibility to what is taking longer.

Answer 6 · 2014-09-24T20:08:16.000Z

the highWaterMark should allow more memory to be used to buffer between the streams thus less downtime between files since zip is a serial process.

Answer 7 · 2014-09-24T20:10:37.000Z

so something like

deployFiles = [
        '**',
        '!build-report.txt',
        '!util/**',
        '!jasmine-favicon-reporter/**',
        '!**/*.uncompressed.js',
        '!**/*consoleStripped.js',
        '!**/*.min.*',
        '!**/tests/**',
        '!**/bootstrap/test-infra/**',
        '!**/bootstrap/less/**'
    ],
...
compress: {
            main: {
                options: {
                    archive: 'deploy/deploy.zip',
                    highWaterMark: 1024 * 1024 * 16
                },
                files: [{
                    src: deployFiles,
                    dest: './',
                    cwd: 'dist/',
                    expand: true
                }]
            }
        }

Answer 8 · 2014-09-24T22:30:32.000Z

going on 6 minutes with the highwatermark... Looks like it made it worse.

Answer 9 · 2014-09-24T22:32:38.000Z

7 minutes and a file size of 17183089 bytes. Is a compression level affecting speeds?

Answer 10 · 2014-09-24T22:49:44.000Z

hum the default is to use default compression level for os. how long does it take if you pass store: true just under expand: true; remove the highWaterMark also.

what kind of specs do you have? it could be a combination of things including cpu speed, memory levels, and hard drive speed. eitherway thats a lot of files to process which i believe is what your seeing as older versions of compress used an archiver that assumed more vs verifying with FS lookups.

Answer 11 · 2014-09-24T23:01:55.000Z

i'm on a macbook pro running 64 bit windows server in fusion. I think the vm has 7 gig of ram and 2 processors with a 70gb assigned to it from the mbp's SSD.

with store: true it took about 6:30

Answer 12 · 2014-09-25T00:25:00.000Z

yah, im thinking part of it has to do with stat. i made some tweaks to the archiver core, saw about 10ms drop by making them run in a queue like system vs just before appending a file. ill have you try the newer version once its available as by those measures it should be about 2min faster if not more since im going to also allow reuse of stat data that compress gets.

Answer 13 · 2014-09-25T01:48:38.000Z

sounds great?!

Answer 14 · 2014-09-26T14:39:30.000Z

I wonder how archiver's performance would compare to a native tool like zip or tar, I'll do some benchmarks.

Answer 15 · 2014-09-26T15:22:28.000Z

@silverwind would love seeing the results of this!

Also, you could try spawning a zip or tar command to compare node implementations (:.

Answer 16 · 2014-09-26T19:10:23.000Z

native tools will be faster due to being pure c. how much faster idk as node fs bindings are c also.

Answer 17 · 2014-10-02T02:29:03.000Z

@ctalkington did you cut that version you wanted me to try out?

Answer 18 · 2014-10-03T15:42:40.000Z

not yet, it will most likely be part of 0.12 due to some changes in api and options. most likely will come out in 2 weeks.depending on free time.

Answer 19 · 2015-01-14T10:22:09.000Z

Please do something in optimizing the things. When zipping a big git repository, it is very slow, consuming the resources:

And I'm on a SSD... 😞

Answer 20 · 2015-01-14T19:14:28.000Z

@IonicaBizau how many files did that entail? what was resulting zip size? and how long did it take?

Answer 21 · 2015-01-16T14:57:36.000Z

The zipped directory has 70MB, and using zip -r foo.zip mydir, mydir was archived quickly:

real    0m10.109s
user    0m6.050s
sys 0m3.064s

For the same directory, the archiving process started on Fri Jan 16 2015 16:04:15 GMT+0200 (EET), and haven't finished yet (after 50 minutes).

Also, the processor jumps to 100%.

The node process was consuming 1.4 GB, but now it is consuming 2 GB...

Answer 22 · 2015-01-16T17:16:54.000Z

So I finally came around to compare archiver, yazl and native zip. This is on OS X with iojs 1.02 (which seems around 10% faster than node 0.10)

Moderate tree, 4549 files:

zip: 6.0s
yazl: 7.9s
archiver: 10.1s

Linux tree, 48410 files:

zip: 64s
yazl: 115s
archiver: 45min and still going

I tried various sized trees, and I think after a certain amount of files, archiver doesn't finish the job. Right now the last write to the archive happened 3 minutes ago, CPU sitting at 100%.

@ctalkington Test with the contents of https://github.com/torvalds/linux/archive/master.zip
@IonicaBizau Please do tell how many files you're zipping, the size is pretty irrelevant to this issue it seems.

Answer 23 · 2015-01-17T02:25:11.000Z

@silverwind thanks, id also be curious to know if its better or worse when store: true is used (ie removing nodes zlib from mix)

also, was this using bulk? does yazl stat files?

Answer 24 · 2015-01-17T02:32:41.000Z

also, if i had to guess on the moderate tree the 3s difference comes from archiver auto sorting out what the input is and the internal streaming.

the linux tree though, that seems like its stuck somewhere in the queue.

@silverwind do you have the script you used to compare? so that I can fiddle

Answer 25 · 2015-01-17T03:38:24.000Z

Here you go: https://gist.github.com/silverwind/ac8ec0c33753057cafe7

npm install yazl archiver graceful-fs
node bench.js [folder in same dir]

Had to use graceful-fs because of EMFILE errors, but it's an easy switch in the require() if you want to use vanilla fs. Also, I didn't compare the resulting zip contents, but yazl zips ended up a bit smaller every run.

Answer 26 · 2015-01-17T18:07:40.000Z

@silverwind Please do tell how many files you're zipping -- I forgot to include that, I know it makes the difference. The directory I am zipping is a git repository, containing 219623 files.

$ find . -type f | wc -l
219623

Answer 27 · 2015-01-17T18:25:06.000Z

thats way more than i ever thought this library would be used for. guessing the bottleneck is the queuing but ill have to dig into it to try and figure out what changes when you get into big volumes of files.

EDIT: it does seem archiver does the job but doesn't fully finalize / close and ends up memory leaking or similar.

Answer 28 · 2015-01-17T22:56:49.000Z

been testing some things, it would appear that finalize does get ran, things just never close.

EDIT: this also appears to only effect zip, tar goes through fine. im wondering if it has to do with ZIP64 kicking in.

EDIT2: so the process gets through to the point of ending the zip-stream. it would seem like the stream is backing up.

Answer 29 · 2015-01-18T02:53:36.000Z

@silverwind can you confirm if running your tests back to back change the results? im noticing on windows that it seems like drive buffer or something is speeding it up ~5000 test to 14s but some times its jumps to a minute.

Answer 30 · 2015-01-18T05:40:46.000Z

@ctalkington I get pretty consistent results on OS X:

yazl took 8516 ms
archiver took 10437 ms

yazl took 8530 ms
archiver took 10592 ms

yazl took 8717 ms
archiver took 10482 ms

yazl took 8819 ms
archiver took 10442 ms

yazl took 8549 ms
archiver took 10563 ms

Probably Windows prefetch or something.

Answer 31 · 2015-01-18T07:24:36.000Z

I use this module in the github-contributions project. If you like to test, checkout the 1.0.0 branch before.

Answer 32 · 2015-01-22T14:27:52.000Z

@ctalkington Supposing I want to fix this issue, where should I start? Where the things are getting fishy?

Answer 33 · 2015-01-23T03:35:43.000Z

@IonicaBizau ive been looking at it. it seems related to the streaming getting backed up. i have noticed the new directory function to be a bit more reliable.

maybe if you wanted to test that on your samples.

Answer 34 · 2015-01-23T05:52:49.000Z

Is the fix pushed in the repository, or how can I test?

Answer 35 · 2015-01-23T05:54:37.000Z

0.14.0 has the new directory helper, albeit in the most basic of form.

if you want to throw your massive payload at it, then itd be good to know if it gets through it or hangs like before.

Answer 36 · 2015-01-23T05:55:50.000Z

Wow, testing it! Thanks!

Answer 37 · 2015-01-23T06:05:50.000Z

also, lets move this to #114 as its a little different of an issue.

Answer 38 · 2015-02-15T01:56:09.000Z

closing it out as its been a mix of issues. feel free to compare your results with the latest release and open a new issue if you still see such a slow down.