This info is out of date, please refer to the Learning Path Learn how to build and use Cloudflare zlib on Arm servers
This demo illustrates how to improve performance of libz for a Python application.
It has been prepared for an Arm Neoverse system running Ubuntu.
It can be run on any cloud provider offering Arm instances, including AWS, Oracle Cloud, Google Cloud, or Microsoft Azure.
For more info about Arm cloud instances refer to Getting Started with Arm Cloud instances
This example explains how to improve application performance using a custom version of zlib.
Set the default python to be python3
sudo apt update
sudo apt install build-essential -y
sudo apt install python-is-python3 -y
Clone the project
git clone https://github.com/jasonrandrews/zip.git
cd zip
Install perf
sudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r` -y
sudo sh -c "echo '1' > /proc/sys/kernel/perf_event_paranoid"
dd if=/dev/zero of=largefile count=1M bs=1024
Run the python program to gzip the largefile
perf stat python ./zip.py
Note the seconds of elapsed time
It's also possible to just time the execution.
time python ./zip.py
perf record -F 99 -g python ./zip.py
git clone https://github.com/brendangregg/FlameGraph
perf script | ./FlameGraph/stackcollapse-perf.pl > out.perf-folded && ./FlameGraph/flamegraph.pl out.perf-folded > flamegraph1.svg
Run perf report
perf report
Note that the crc32 function is taking significant time
69.92% 68.27% python libz.so.1.2.11 [.] crc32
lscpu | grep crc32
objdump -d /usr/lib/aarch64-linux-gnu/libz.so.1 | awk -F" " '{print $3}' | grep crc32 | wc -l
If it returns 0 there are no crc instructions in libz.
git clone https://github.com/cloudflare/zlib.git
pushd zlib && ./configure
make && sudo make install
popd
Confirm new libz has crc instructions.
objdump -d /usr/local/lib/libz.so | awk -F " " '{print $3}' | grep crc32 | wc -l
LD_PRELOAD=/usr/local/lib/libz.so perf stat python ./zip.py
Note the new seconds of elapsed time.
LD_PRELOAD=/usr/local/lib/libz.so perf perf record -F 99 -g python ./zip.py
perf script | ./FlameGraph/stackcollapse-perf.pl > out.perf-folded && ./FlameGraph/flamegraph.pl out.perf-folded > flamegraph2.svg
For more infomation refer to Improve data compression performance on AWS Graviton processors