software-mansion/scarb

Evaluate using Profile-Guided Optimization (PGO) for Scarb

zamazan4ik opened this issue · 2 comments

Hi!

As was explained here (thanks a lot for the explanation btw!), I decided to test the Profile-Guided Optimization (PGO) technique to optimize Scarb's performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot with many projects (including compilers, code formatters, language servers, linters, etc.), I decided to apply it to Scarb to see if the performance win (or loss) can be achieved. Here are my benchmark results.

Test environment

  • Fedora 40
  • Linux kernel 6.10.12
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.81.0
  • scarb version: main branch on commit eec9b4af3bdbc2ec9c391201fd530ba2b39e98ba
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use scarb build command for building examples. For PGO optimization I use cargo-pgo tool. The PGO training workload - scarb build of the starknet_multiple_contracts project with the PGO instrumented scarb (is done with cargo pgo build -- --bin scarb).

taskset -c 0 is used to reduce the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee) and multiple times with hyperfine.

Results

I got the following results.

Compilation of the starknet_multiple_contracts project (the same as the PGO training set):

hyperfine --warmup 3 'taskset -c 0 ../scarb_release build' 'taskset -c 0 ../scarb_optimized build'
Benchmark 1: taskset -c 0 ../scarb_release build
  Time (mean ± σ):      2.839 s ±  0.017 s    [User: 2.493 s, System: 0.332 s]
  Range (min … max):    2.814 s …  2.868 s    10 runs

Benchmark 2: taskset -c 0 ../scarb_optimized build
  Time (mean ± σ):      2.508 s ±  0.010 s    [User: 2.166 s, System: 0.329 s]
  Range (min … max):    2.493 s …  2.522 s    10 runs

Summary
  'taskset -c 0 ../scarb_optimized build' ran
    1.13 ± 0.01 times faster than 'taskset -c 0 ../scarb_release build'

Compilation of the workspaces project:

hyperfine --warmup 3 'taskset -c 0 ../scarb_release build --workspace' 'taskset -c 0 ../scarb_optimized build --workspace'
Benchmark 1: taskset -c 0 ../scarb_release build --workspace
  Time (mean ± σ):      7.202 s ±  0.023 s    [User: 6.567 s, System: 0.605 s]
  Range (min … max):    7.169 s …  7.238 s    10 runs

Benchmark 2: taskset -c 0 ../scarb_optimized build --workspace
  Time (mean ± σ):      6.483 s ±  0.018 s    [User: 5.865 s, System: 0.589 s]
  Range (min … max):    6.452 s …  6.505 s    10 runs

Summary
  'taskset -c 0 ../scarb_optimized build --workspace' ran
    1.11 ± 0.00 times faster than 'taskset -c 0 ../scarb_release build --workspace'

, where in both cases scarb_release - Release build, scarb_optimized - PGO-optimized build.

According to the results, we see consistent compilation speed improvement. I expected such results since from my experience all or almost all compilers can benefit a lot from applying PGO.

Further steps

I can suggest the following action points:

  • Mention somewhere in the user-visible place that PGO brings measurable performance improvements for the project
  • Integrate PGO into the build pipeline (like it's done in CPython or other projects)
  • Optimize with PGO prebuilt binaries (if any)
  • Test PGO for other Scrab parts: code formatter, LSP server, etc. According to the awesome-pgo results, such tools also can be performance-improved with PGO

Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT. However, it's a much less mature optimization technique compared to PGO.

Thank you.

Wow, that's a fascinating result, 10% speed-up on a trivial codebase is pretty astonishing. Sounds very promising also for CairoLS. I have added a task on our issue tracker: starkware-libs/cairo#6476

Thanks so much @zamazan4ik!!! Very interesting stuff! 🚀
We will for sure want to investigate using this technique!

Right now I am failing to reproduce it on both arm mac and x64 fedora, getting llvm/llvm-project#57501 :/ But we will look into this more! :D