Profile-Guided Optimization (PGO) improvements
zamazan4ik opened this issue · 0 comments
Hi!
I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Delta. I did some benchmarks on my local machine and want to share my results.
Test environment
- Apple Macbook M1 (full charge, AC connected)
- macOS 13.4 Ventura
- Rust: 1.72
- Latest Delta from the
master
branch (commit7375f7a165dabe430e12d531fedd84bb3a027c6b
)
Test workload
As a test scenario, I used make benchmark
command. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee ofc). The measurements were performed with hyperfine
. The PGO optimization is done with cargo-pgo.
Results
Here are the results. Also, I posted Instrumentation results so you can estimate how delta
slow in the Instrumentation mode.
PGO optimized compared to Release builds:
hyperfine --warmup 10 --min-runs 20 'delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null' 'delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
Time (mean ± σ): 405.8 ms ± 2.1 ms [User: 396.2 ms, System: 38.4 ms]
Range (min … max): 403.0 ms … 410.1 ms 20 runs
Benchmark 2: delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
Time (mean ± σ): 413.0 ms ± 3.4 ms [User: 403.6 ms, System: 38.7 ms]
Range (min … max): 409.7 ms … 422.6 ms 20 runs
Summary
delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null ran
1.02 ± 0.01 times faster than delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
PGO-instrumented version:
LLVM_PROFILE_FILE=/Users/zamazan4ik/open_source/delta/target/pgo-profiles/delta_%m_%p.profraw hyperfine --warmup 10 --min-runs 20 'target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
Time (mean ± σ): 575.4 ms ± 4.7 ms [User: 562.2 ms, System: 44.1 ms]
Range (min … max): 569.0 ms … 588.7 ms 20 runs
So, PGO makes some small improvements at least in the project's benchmarks. But getting a "free" 1-2% performance is not a bad thing after all :)
Possible further steps
I can suggest to do the following things:
- Add a note to the Delta documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their own Delta binaries will be aware of PGO as an additional way to optimize the project
- Try to use LLVM BOLT in addition to PGO. However, I do not expect huge improvements from BOLT in this project