Profile-Guided Optimization (PGO) improvements

Question

Profile-Guided Optimization (PGO) improvements

zamazan4ik opened this issue a year ago · 0 comments

Hi!

I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Delta. I did some benchmarks on my local machine and want to share my results.

Test environment

Apple Macbook M1 (full charge, AC connected)
macOS 13.4 Ventura
Rust: 1.72
Latest Delta from the master branch (commit 7375f7a165dabe430e12d531fedd84bb3a027c6b )

Test workload

As a test scenario, I used make benchmark command. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee ofc). The measurements were performed with hyperfine. The PGO optimization is done with cargo-pgo.

Results

Here are the results. Also, I posted Instrumentation results so you can estimate how delta slow in the Instrumentation mode.

PGO optimized compared to Release builds:

hyperfine --warmup 10 --min-runs 20 'delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null' 'delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     405.8 ms ±   2.1 ms    [User: 396.2 ms, System: 38.4 ms]
  Range (min … max):   403.0 ms … 410.1 ms    20 runs

Benchmark 2: delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     413.0 ms ±   3.4 ms    [User: 403.6 ms, System: 38.7 ms]
  Range (min … max):   409.7 ms … 422.6 ms    20 runs

Summary
  delta_pgo_optimized --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null ran
    1.02 ± 0.01 times faster than delta_release --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null

PGO-instrumented version:

LLVM_PROFILE_FILE=/Users/zamazan4ik/open_source/delta/target/pgo-profiles/delta_%m_%p.profraw hyperfine --warmup 10 --min-runs 20 'target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/aarch64-apple-darwin/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):     575.4 ms ±   4.7 ms    [User: 562.2 ms, System: 44.1 ms]
  Range (min … max):   569.0 ms … 588.7 ms    20 runs

So, PGO makes some small improvements at least in the project's benchmarks. But getting a "free" 1-2% performance is not a bad thing after all :)

Possible further steps

I can suggest to do the following things:

Add a note to the Delta documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their own Delta binaries will be aware of PGO as an additional way to optimize the project
Try to use LLVM BOLT in addition to PGO. However, I do not expect huge improvements from BOLT in this project