tar.Extract is very slow
Closed this issue · 4 comments
I'm trying to use node-tar to extract a tar file. The code is at:
https://github.com/raymondfeng/node-tar-perf/blob/master/untar.js
The performance is really bad comparing to tar command. For a 152MB tar with some big files, it took more than 1 min.
After debugging, I found out the tar entries are written out in chunks of 512 bytes. That is probably due to the tar format.
I did an experiment to add a buffered stream before sending to fs. The new code is:
https://github.com/raymondfeng/node-tar-perf/blob/master/untar-with-buffer.js
Now it only took around 1 second to extract the tar.
I initially tried to add the buffered-stream with node-tar code but I had trouble getting it working. Maybe fstream is a special.
I also wonder if Node stream/fs apis should have options to support buffering.
I also wonder if Node stream/fs apis should have options to support buffering.
As of 0.10, they do.
fstream and node-tar both need to be rewritten to use the streams2 classes (ie, stream.Readable and friends).
@isaacs is this still something that needs to be done?
@terinjokes Yeah, still an issue.
# Get an example file
$ wget https://s3.amazonaws.com/node-webkit/v0.9.2/node-webkit-v0.9.2-linux-x64.tar.gz -O - | gunzip > node-webkit-v0.9.2-linux-x64.tar
# Linux tar utility
$ time tar xf node-webkit-v0.9.2-linux-x64.tar
real 0m0.119s
user 0m0.000s
sys 0m0.119s
# node-tar extracting the same file
$ time node examples/extracter.js
done
real 2m52.815s
user 2m43.038s
sys 0m13.689s
Native tar is ~1450 times faster.