Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT
zamazan4ik opened this issue · 0 comments
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. Since PGO helps with achieving better performance in many projects I think trying to optimize difftastic with PGO can be a good idea.
I already did some benchmarks and want to share my results.
Test environment
- Fedora 38
- Linux kernel 6.5.6
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.59
- Difftastic version: the latest for now from the
master
branch on commit21ed3ec48b383511b08ffe20cc91697af8f64d78
- Disabled Turbo boost
Benchmark
For benchmark purposes, I use difft difftastic/sample_files/dir_before/ difftastic/sample_files/dir_after/
as a usual way for using difftastic in practice. For the training PGO phase, I use completely the same command. The release version is built with cargo pgo --release
, and PGO (instrumentation and optimization phases) are done with cargo-pgo.
Results
I got the following results:
hyperfine --warmup 10 --min-runs 50 './difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null' './difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null'
Benchmark 1: ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
Time (mean ± σ): 384.2 ms ± 5.2 ms [User: 288.5 ms, System: 126.8 ms]
Range (min … max): 373.6 ms … 396.9 ms 50 runs
Benchmark 2: ./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
Time (mean ± σ): 354.7 ms ± 4.1 ms [User: 257.7 ms, System: 127.3 ms]
Range (min … max): 347.0 ms … 362.7 ms 50 runs
Summary
./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null ran
1.08 ± 0.02 times faster than ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
where difft_release
- Release binary, difft_optimized
- Release + PGO binary.
Regarding binary sizes:
- Release: 68 Mib
- Release + PGO: 68 Mib
- Instrumented: 75 Mib
At least in the scenario above, PGO helps with optimizing performance.
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks on difftastic. If it shows improvements - add a note about possible improvements in difftastic's performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize difftastic according to their own workloads.
- Optimize pre-built binaries
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag