Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT
zamazan4ik opened this issue · 0 comments
Hi!
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to bat
. I already performed some benchmarks and want to share my results here.
Test environment
- Fedora 38
- Linux kernel 6.5.5
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.73
bat
version: the latest for now from themaster
branch on commitfbe9b6f15fe64b4a5bde0478260dc67942731153
Benchmark setup
For the benchmark purpose, I use the scenario from #2397 - bat --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
. For PGO profile collection the same arguments and test file were used. Release build is done with cargo build --release
, PGO optimized build is done with cargo-pgo.
All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).
Results
I got the following results:
hyperfine --warmup 5 --min-runs 30 './bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py' './bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py'
Benchmark 1: ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
Time (mean ± σ): 1.169 s ± 0.058 s [User: 1.131 s, System: 0.034 s]
Range (min … max): 1.139 s … 1.465 s 30 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: ./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
Time (mean ± σ): 1.107 s ± 0.011 s [User: 1.069 s, System: 0.035 s]
Range (min … max): 1.080 s … 1.135 s 30 runs
Summary
./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py ran
1.06 ± 0.05 times faster than ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
At least according to the simple benchmark above, PGO has a measurable positive effect on bat
performance.
Further steps
I can suggest the following things to do:
- Evaluate PGO's applicability to
bat
in more scenarios. - If PGO helps to achieve better performance - add a note to bat's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for bat.
- Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
- Optimize prebuilt binaries with PGO.
Here are some examples of how PGO is already integrated into other projects' build scripts:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.