JuliaPackaging/BinaryBuilderBase.jl

Packaging GCCBootstrap too slow using 7zip Gzip compression

Closed this issue · 3 comments

omus commented

I've been experimenting with building the GCCBootstrap@11-IainS shard for aarch64-apple-darwin20 (Apple Silicon) and have found that most of the time is taken up compressing the artifact. In particular when running on AWS using an c5ad.8xlarge instance I found that I could build the shard in 23 minutes but compressing the GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz was taking around an 1 hour and 10 minutes (plus another round of compression for squashfs).

When BinaryBuilder shows the output:

[ Info: Tree hash of contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: 120bcbd3d356f2fa86d3919c58ee1dd67ab757f5

You can see in the background the process 7z a -si -tgzip -mx9 running on a single thread.

We could potentially greatly benefit from using an alternate tool for performing compression, specifically one that can use multiple threads.

omus commented

I've looked into other compression tools (pigz and zstd) and formats to try and speed up the process. I used a the uncompressed contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz which uncompressed is a size of 5316188080 bytes (5.1GB) and with 5629 files. I performed these tests on an AWS c5ad.8xlarge instance using NVME storage.

$ tar cvf - . | 7za a -si -tgzip -mx9 ../7z9-test.tar.gz  #  No threading, using gzip at maximum compression
real  69m48.743s
user  69m44.884s
sys   0m6.908s
size  1558218029 bytes (1.5GB)

$ tar cvf - . | 7za a -si -tgzip -mx9 -mmt32 ../7z9T-test.tar.gz  #  Threading enabled, but deflate doesn't support it, using gzip at maximum compression
real  71m11.411s
user  71m5.877s
sys   0m6.959s
size  1558218029 bytes (1.5GB)

$ tar cvf - . | 7za a -si -t7z -mx9 -mmt32 ../7z9T-test.tar.7z  # Threading enabled, using 7z at maximum compression
real  2m48.593s
user  53m14.795s
sys   0m34.856s
size  867651155 bytes (828MB)

$ tar cvf - . | pigz -9 > ../pigz9-test.tar.gz  # Threaded gzip
rea   0m32.633s
user  14m54.496s
sys   0m10.052s
size  1628989378 bytes (1.6GB)

$ tar cvf - . | pigz -11 > ../pigz11-test.tar.gz  # Threaded gzip, uses `zopfli` and noted to be much slower 
real  35m4.057s
user  1027m27.692s
sys   0m7.446s
size  1552064303 bytes (1.5GB)

$ tar cvf - . | zstd -3 > ../zstd3-test.tar.zstd  # No threading, using zstd format at default compression level
real  0m28.612s
user  0m29.962s
sys	  0m3.960s
size  1478900541 bytes (1.4GB)

$ tar cvf - . | zstd -3 -T0 > ../zstd3T-test.tar.zstd  # Threading, using zstd format at default compression level
real  0m4.129s
user  0m31.852s
sys   0m3.883s
size  1478900541 bytes (1.4GB)

$ tar cvf - . | zstd -19 -T0 > ../zstd19T-test.tar.zstd  # Threading, using zstd format at maximum compression
real  2m45.845s
user  38m59.038s
sys   0m4.949s
size  1094192504 bytes (1.1GB)

$ tar cvf - . | zstd -3 -T0 --format=gzip > ../zstd3T-test.tar.gz  # Threading, using gzip at default compression
real  1m50.766s
user  1m47.594s
sys   0m5.689s
size 1743704182 bytes (1.7GB)

$ tar cvf - . | zstd -19 -T0 --format=gzip > ../zstd19T-test.tar.gz  # Threading, using gzip at maximum compression
real  13m22.035s
user  13m18.241s
sys   0m6.396s
size  1629599641 bytes (1.6GB)

$ tar cvf - . | xz -9 -T 0 > ../xz9T-test.tar.xz  # Threading, using xz at maximum compression
real  2m34.177s
user  52m23.554s
sys   0m48.543s
size  876676908 (837MB)

$ tar cvf - . | 7za a -si -txz -mx9 -mmt32 ../7z9T-test.tar.xz  # Threading, using xz at maximum compression
real  2m47.397s
user  53m14.161s
sys   0m34.540s
size  867651116 (828M)
omus commented

I'll also note I prototyped using pigz for creating the BB artifacts and this was the result:

...
┌ Warning: Broken symlink: aarch64-apple-darwin20/sys-root/usr/local
└ @ BinaryBuilder.Auditor ~/.julia/packages/BinaryBuilder/vDNXJ/src/auditor/symlink_translator.jl:42
[ Info: Compressing files in /eph/Yggdrasil/0_RootFS/GCCBootstrap@11-IainS/build/aarch64-apple-darwin20/en4OWz8J/destdir/logs
[ Info: /eph/Yggdrasil/0_RootFS/GCCBootstrap@11-IainS/products/GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz already exists, force-overwriting...
[ Info: Tree hash of contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: 0c4949b62a64f1b400951152b765acef0f8d4cc0
[ Info: SHA256 of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: d79414ba41343366c4fbc1a83ff0c89a1beb554e3e7bcdc06e6fb4e26ab97b09
[ Info: GCCBootstrap-aarch64-apple-darwin20.v11.0.0-iains.x86_64-linux-musl.squashfs hash: e27b06892d9dff177d06fb704467e8a614cf82ab

real	26m47.585s
user	480m10.742s
sys	16m33.042s
omus commented

My interpretation of the benchmarks is that we should switch to 7z using maximum compression and threading. The archive size reduction seems worth the relatively short runtime. Unfortunately as Pkg doesn't know about the 7z format that should be a longer term goal.

In the interim we can use pigz to continue to use the Gzip format and benefit from the significantly reduced runtime over 7zip's non-threaded Gzip support.