getporter/porter

Optional gzipping when using porter archive

Closed this issue · 1 comments

Is your feature request related to a problem? Please describe.
When using porter archive to save an image to a file to distribute, it always runs gzip on the archive to produce a tar.gz. Running gzip is a fairly CPU demanding thing to do, and it seems in our experience, the archives are only minimally smaller. For one bundle we saw a tgz of 2.30GB extracting to a tar of 2.32GB. While saving that space can be relevant, it may be preferable for gzipping to be optional to speed up porter archive and porter publish commands themselves.

Describe the solution you'd like
Depending on preference, a flag enabling or disabling gzipping:
--gzip <- Default false, enable gzip if relevant
--no-gzip <- Default true, disable gzip if relevant

And then handle .tgz as well as .tar in porter publish --archive

Describe alternatives you've considered
Haven't been able to come up with alternatives, currently we just accept the extra time it takes.

Additional context
Let me know if you need more information, I couldn't come up with anything relevant.

From the CNAB spec:

A thick bundle SHOULD be encoded as a gzipped TAR. This specification is neutral as to what compression ratio is used.

Perhaps a CLI flag allowing configuration of the compression level would be better?

  • this will then allow the user to select NoCompression
  • avoid issues when publishing as the archive file is handled (decompressed/unpacked/etc) by the cnabio/cnab-go library (which has some tgz assumptions)
  • cleaner code in archive.go as the gzipWriter would not need to be handled conditionally

That being said, it will improve the speed of the archive process but it appears that the actual data transfer speed is the (most) limiting factor when archiving a bundle. Below is a few examples of archiving a ~2.3GiB bundle with and without compression:

# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive huge-defaultcomp.tgz --reference <huge bundle ref> --force
real    2m36.773s
user    1m26.772s
sys     0m15.242s

# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive huge-nocomp.tgz --reference <huge bundle ref> --force
real    1m59.890s
user    0m13.060s
sys     0m8.260s

# just tar
$ time ./bin/porter-no-gzip archive huge.tar --reference <huge bundle ref> --force
real    2m0.262s
user    0m11.853s
sys     0m8.895s

# the resulting file sizes
$ du -m huge*
2376    huge-defaultcomp.tgz
2395    huge-nocomp.tgz
2395    huge.tar

A quick test on a bandwidth constrained networks improves the archive time of the same huge bundle from 16m56s to 15m1s 🙀

Similar improvement can be observed when archiving the whalegap bundle:

# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive whalegap-defaultcomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real    0m20.463s
user    0m12.249s
sys     0m2.100s

# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive whalegap-nocomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real    0m14.106s
user    0m1.923s
sys     0m0.906s