Wilfred/difftastic

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT

zamazan4ik opened this issue · 0 comments

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. Since PGO helps with achieving better performance in many projects I think trying to optimize difftastic with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

  • Fedora 38
  • Linux kernel 6.5.6
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.59
  • Difftastic version: the latest for now from the master branch on commit 21ed3ec48b383511b08ffe20cc91697af8f64d78
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use difft difftastic/sample_files/dir_before/ difftastic/sample_files/dir_after/ as a usual way for using difftastic in practice. For the training PGO phase, I use completely the same command. The release version is built with cargo pgo --release, and PGO (instrumentation and optimization phases) are done with cargo-pgo.

Results

I got the following results:

hyperfine --warmup 10 --min-runs 50 './difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null' './difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null'
Benchmark 1: ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
  Time (mean ± σ):     384.2 ms ±   5.2 ms    [User: 288.5 ms, System: 126.8 ms]
  Range (min … max):   373.6 ms … 396.9 ms    50 runs

Benchmark 2: ./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null
  Time (mean ± σ):     354.7 ms ±   4.1 ms    [User: 257.7 ms, System: 127.3 ms]
  Range (min … max):   347.0 ms … 362.7 ms    50 runs

Summary
  ./difft_optimized ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null ran
    1.08 ± 0.02 times faster than ./difft_release ../difftastic/sample_files/dir_before/ ../difftastic/sample_files/dir_after/ > /dev/null

where difft_release - Release binary, difft_optimized - Release + PGO binary.

Regarding binary sizes:

  • Release: 68 Mib
  • Release + PGO: 68 Mib
  • Instrumented: 75 Mib

At least in the scenario above, PGO helps with optimizing performance.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on difftastic. If it shows improvements - add a note about possible improvements in difftastic's performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize difftastic according to their own workloads.
  • Optimize pre-built binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects: