Packaging GCCBootstrap too slow using 7zip Gzip compression
Closed this issue · 3 comments
I've been experimenting with building the GCCBootstrap@11-IainS
shard for aarch64-apple-darwin20
(Apple Silicon) and have found that most of the time is taken up compressing the artifact. In particular when running on AWS using an c5ad.8xlarge
instance I found that I could build the shard in 23 minutes but compressing the GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz
was taking around an 1 hour and 10 minutes (plus another round of compression for squashfs).
When BinaryBuilder shows the output:
[ Info: Tree hash of contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: 120bcbd3d356f2fa86d3919c58ee1dd67ab757f5
You can see in the background the process 7z a -si -tgzip -mx9
running on a single thread.
We could potentially greatly benefit from using an alternate tool for performing compression, specifically one that can use multiple threads.
I've looked into other compression tools (pigz
and zstd
) and formats to try and speed up the process. I used a the uncompressed contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz
which uncompressed is a size of 5316188080 bytes (5.1GB) and with 5629 files. I performed these tests on an AWS c5ad.8xlarge
instance using NVME storage.
$ tar cvf - . | 7za a -si -tgzip -mx9 ../7z9-test.tar.gz # No threading, using gzip at maximum compression
real 69m48.743s
user 69m44.884s
sys 0m6.908s
size 1558218029 bytes (1.5GB)
$ tar cvf - . | 7za a -si -tgzip -mx9 -mmt32 ../7z9T-test.tar.gz # Threading enabled, but deflate doesn't support it, using gzip at maximum compression
real 71m11.411s
user 71m5.877s
sys 0m6.959s
size 1558218029 bytes (1.5GB)
$ tar cvf - . | 7za a -si -t7z -mx9 -mmt32 ../7z9T-test.tar.7z # Threading enabled, using 7z at maximum compression
real 2m48.593s
user 53m14.795s
sys 0m34.856s
size 867651155 bytes (828MB)
$ tar cvf - . | pigz -9 > ../pigz9-test.tar.gz # Threaded gzip
rea 0m32.633s
user 14m54.496s
sys 0m10.052s
size 1628989378 bytes (1.6GB)
$ tar cvf - . | pigz -11 > ../pigz11-test.tar.gz # Threaded gzip, uses `zopfli` and noted to be much slower
real 35m4.057s
user 1027m27.692s
sys 0m7.446s
size 1552064303 bytes (1.5GB)
$ tar cvf - . | zstd -3 > ../zstd3-test.tar.zstd # No threading, using zstd format at default compression level
real 0m28.612s
user 0m29.962s
sys 0m3.960s
size 1478900541 bytes (1.4GB)
$ tar cvf - . | zstd -3 -T0 > ../zstd3T-test.tar.zstd # Threading, using zstd format at default compression level
real 0m4.129s
user 0m31.852s
sys 0m3.883s
size 1478900541 bytes (1.4GB)
$ tar cvf - . | zstd -19 -T0 > ../zstd19T-test.tar.zstd # Threading, using zstd format at maximum compression
real 2m45.845s
user 38m59.038s
sys 0m4.949s
size 1094192504 bytes (1.1GB)
$ tar cvf - . | zstd -3 -T0 --format=gzip > ../zstd3T-test.tar.gz # Threading, using gzip at default compression
real 1m50.766s
user 1m47.594s
sys 0m5.689s
size 1743704182 bytes (1.7GB)
$ tar cvf - . | zstd -19 -T0 --format=gzip > ../zstd19T-test.tar.gz # Threading, using gzip at maximum compression
real 13m22.035s
user 13m18.241s
sys 0m6.396s
size 1629599641 bytes (1.6GB)
$ tar cvf - . | xz -9 -T 0 > ../xz9T-test.tar.xz # Threading, using xz at maximum compression
real 2m34.177s
user 52m23.554s
sys 0m48.543s
size 876676908 (837MB)
$ tar cvf - . | 7za a -si -txz -mx9 -mmt32 ../7z9T-test.tar.xz # Threading, using xz at maximum compression
real 2m47.397s
user 53m14.161s
sys 0m34.540s
size 867651116 (828M)
I'll also note I prototyped using pigz
for creating the BB artifacts and this was the result:
...
┌ Warning: Broken symlink: aarch64-apple-darwin20/sys-root/usr/local
└ @ BinaryBuilder.Auditor ~/.julia/packages/BinaryBuilder/vDNXJ/src/auditor/symlink_translator.jl:42
[ Info: Compressing files in /eph/Yggdrasil/0_RootFS/GCCBootstrap@11-IainS/build/aarch64-apple-darwin20/en4OWz8J/destdir/logs
[ Info: /eph/Yggdrasil/0_RootFS/GCCBootstrap@11-IainS/products/GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz already exists, force-overwriting...
[ Info: Tree hash of contents of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: 0c4949b62a64f1b400951152b765acef0f8d4cc0
[ Info: SHA256 of GCCBootstrap.v11.0.0-iains.aarch64-apple-darwin20.tar.gz: d79414ba41343366c4fbc1a83ff0c89a1beb554e3e7bcdc06e6fb4e26ab97b09
[ Info: GCCBootstrap-aarch64-apple-darwin20.v11.0.0-iains.x86_64-linux-musl.squashfs hash: e27b06892d9dff177d06fb704467e8a614cf82ab
real 26m47.585s
user 480m10.742s
sys 16m33.042s
My interpretation of the benchmarks is that we should switch to 7z
using maximum compression and threading. The archive size reduction seems worth the relatively short runtime. Unfortunately as Pkg doesn't know about the 7z
format that should be a longer term goal.
In the interim we can use pigz
to continue to use the Gzip format and benefit from the significantly reduced runtime over 7zip's non-threaded Gzip support.