Evaluate using LTO, Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) like LLVM BOLT
zamazan4ik opened this issue · 1 comments
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to my tests, PGO helps with achieving better performance in many application domains, including the network-oriented software (e.g. see the results for Envoy, HAProxy, httpd). Since this, I decided to test PGO on Legba. And here are my results.
Test environment
- Fedora 38
- Linux kernel 6.5.6
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.73
- Legba version: the latest for now from the
main
branch on commit5f0739a974f4ad92c254ddfe37aca033b40600e6
- Disabled Turbo boost
Benchmark
For benchmark purposes, I use "HTTP basic auth" scenario from the test_server
directory with the legba http.basic -t 127.0.0.1:8888 --username admin666 --password ./passwords_1m.txt --concurrency 1
command line. concurrency 1
is used just for reducing multithreading jitter influence on the results. As password_1m.txt
file I use this where test12345
password is moved to the end of the file.
For the training PGO phase, I use completely the same command but with a smaller password file (1050 passwords + test12345
at the end) (just to boost the PGO training phase).
I tested the following Legba configurations:
- Release build:
cargo build --release
- Release +
lto = true
+codegen-units = 1
(enable LTO): Apply LTO changes toCargo.toml
and thencargo build --release
- Release +
lto = true
+codegen-units = 1
+ PGO:cargo pgo build
+cargo pgo optimize build
. It's done with cargo-pgo. - Release +
lto = true
+codegen-units = 1
+ PGO + BOLT: Also viacargo-pgo
All benchmarks are done multiple times, on the same machine (with the same hardware/software configuration), with the same background noise (as much as I can guarantee ofc).
Results
I got the following results:
- Release: 276s
- Release +
lto = true
+codegen-units = 1
: 262s - Release +
lto = true
+codegen-units = 1
+ PGO optimized: 247s - Release +
lto = true
+codegen-units = 1
+ PGO optimized + BOLT optimized: 247s
At least in the benchmark above, LTO and PGO help with achieving better performance in Legba. However, seems like LLVM BOLT has no measurable results in this benchmark.
For reference, here are results for the smaller file with 1051 password, so you can estimate how slower PGO instrumented Legba is compared to other configurations:
- Release: 273ms
- Release + LTO: 261ms
- Release + LTO + PGO instrumented: 311ms
- Release + LTO + PGO optimized + BOLT instrumented: 300ms
Here are binary sizes after the strip
command:
- Release: 21 Mib
- Release + LTO: 17 Mib
- Release + LTO + PGO instrumented: 53 Mib
- Release + LTO + PGO optimized: 15 Mib
- Release + LTO + PGO optimized + BOLT instrumented: 68 Mib
- Release + LTO + PGO optimized + BOLT optimized: 20 Mib
Also, I measured build time changes between configurations:
- Release: 3m 10s
- Release +
lto = true
+codegen-units = 1
: 6m 57s - Release +
lto = true
+codegen-units = 1
+ PGO instrumented: 11m 14s - Release +
lto = true
+codegen-units = 1
+ PGO optimized: 6m 40s
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks on Legba in various scenarios. If it shows improvements - add a note to the documentation about possible improvements in legba's performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize legba according to their own workloads.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
@zamazan4ik thank you for such useful insights! I have to admit i didn't know about PGO and BOLT, so I'll have to study a bit before being able to make any meaningful changes to the build system.