sharkdp/bat

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT

zamazan4ik opened this issue · 0 comments

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. So that's why I think it's worth trying to apply PGO to bat. I already performed some benchmarks and want to share my results here.

Test environment

  • Fedora 38
  • Linux kernel 6.5.5
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.73
  • bat version: the latest for now from the master branch on commit fbe9b6f15fe64b4a5bde0478260dc67942731153

Benchmark setup

For the benchmark purpose, I use the scenario from #2397 - bat --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py. For PGO profile collection the same arguments and test file were used. Release build is done with cargo build --release, PGO optimized build is done with cargo-pgo.

All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).

Results

I got the following results:

hyperfine --warmup 5 --min-runs 30 './bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py' './bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py'
Benchmark 1: ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
  Time (mean ± σ):      1.169 s ±  0.058 s    [User: 1.131 s, System: 0.034 s]
  Range (min … max):    1.139 s …  1.465 s    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
  Time (mean ± σ):      1.107 s ±  0.011 s    [User: 1.069 s, System: 0.035 s]
  Range (min … max):    1.080 s …  1.135 s    30 runs

Summary
  ./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py ran
    1.06 ± 0.05 times faster than ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py

At least according to the simple benchmark above, PGO has a measurable positive effect on bat performance.

Further steps

I can suggest the following things to do:

  • Evaluate PGO's applicability to bat in more scenarios.
  • If PGO helps to achieve better performance - add a note to bat's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for bat.
  • Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
  • Optimize prebuilt binaries with PGO.

Here are some examples of how PGO is already integrated into other projects' build scripts:

After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.