DanielHindi/aws-s3-zipper

zips in s3 bucket are corrupted

d3m3tr1s opened this issue · 10 comments

Hello Daniel, I'm using your tool to zip images stored inside s3 bucket using your code as part of AWS Lambda, and facing the problem that for the larger images sets resulting zip is being corrupted
I set Lambda memory limit to the max possible by Amazon 1536 MB, and still having this issue even though cloud watch log shows that lambda call used 300-500 MB and it reports successful completion, zip is being created, though corrupted. When I repeat same with no more than 4-5 images with size 4-5 MB each, it creates healthy zip.

Any suggestion is highly appreciated.

Thank you for you great tool!

So I use this module on a daily basis and zips up hundreds of megabytes of images yet in smaller zip increments.

I'm using npm/archiver to zip. It seems there are updated on [archiver] that may solve this. It needs to be tested.

That being said, you think 100-200 images at 2-5mb each would be a good test?

Thank you for the tip, will try to update archiver module.

Do you use your module wrapped into AWS Lambda function?

As side note I had to change this line

https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L187

to this

var tempFile = '/tmp/__' + Date.now() + '.zip';

to make it work in AWS Lambda, as tmp directory is the only writable place for lambdas

No i dont use it in Lambda... its triggered in an api.

if you make the tempFile configurable and send it in a pull request I'll accept it

Thank you for the tip, @n7best !

The call back should only happen when the file has been zipped and released the problem is the module keeps a lock on the file for a few moments after the callback. Thats why i put a breathe time before moving on

https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L144
finalize or append from archiver does not guarantee the file is zipped.

The end, close or finish events on the destination stream may fire right after calling this method so you should set listeners beforehand to properly detect stream completion.

Which means there must be a different event or callback that gives you the true release of the file

output.on('close', function() {
  console.log(archive.pointer() + ' total bytes');
  console.log('archiver has been finalized and the output file descriptor has closed.');
});

this might be it. need to test

I test all three, not sure why, only finish works for me. My files are pretty large.

I believe 1.0.1 has the potential to fix your issue