Evaluate using Profile-Guided Optimization (PGO) for Scarb
zamazan4ik opened this issue · 2 comments
Hi!
As was explained here (thanks a lot for the explanation btw!), I decided to test the Profile-Guided Optimization (PGO) technique to optimize Scarb's performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot with many projects (including compilers, code formatters, language servers, linters, etc.), I decided to apply it to Scarb to see if the performance win (or loss) can be achieved. Here are my benchmark results.
Test environment
- Fedora 40
- Linux kernel 6.10.12
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.81.0
scarb
version:main
branch on commiteec9b4af3bdbc2ec9c391201fd530ba2b39e98ba
- Disabled Turbo boost
Benchmark
For benchmark purposes, I use scarb build
command for building examples. For PGO optimization I use cargo-pgo tool. The PGO training workload - scarb build
of the starknet_multiple_contracts project with the PGO instrumented scarb
(is done with cargo pgo build -- --bin scarb
).
taskset -c 0
is used to reduce the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee) and multiple times with hyperfine
.
Results
I got the following results.
Compilation of the starknet_multiple_contracts project (the same as the PGO training set):
hyperfine --warmup 3 'taskset -c 0 ../scarb_release build' 'taskset -c 0 ../scarb_optimized build'
Benchmark 1: taskset -c 0 ../scarb_release build
Time (mean ± σ): 2.839 s ± 0.017 s [User: 2.493 s, System: 0.332 s]
Range (min … max): 2.814 s … 2.868 s 10 runs
Benchmark 2: taskset -c 0 ../scarb_optimized build
Time (mean ± σ): 2.508 s ± 0.010 s [User: 2.166 s, System: 0.329 s]
Range (min … max): 2.493 s … 2.522 s 10 runs
Summary
'taskset -c 0 ../scarb_optimized build' ran
1.13 ± 0.01 times faster than 'taskset -c 0 ../scarb_release build'
Compilation of the workspaces project:
hyperfine --warmup 3 'taskset -c 0 ../scarb_release build --workspace' 'taskset -c 0 ../scarb_optimized build --workspace'
Benchmark 1: taskset -c 0 ../scarb_release build --workspace
Time (mean ± σ): 7.202 s ± 0.023 s [User: 6.567 s, System: 0.605 s]
Range (min … max): 7.169 s … 7.238 s 10 runs
Benchmark 2: taskset -c 0 ../scarb_optimized build --workspace
Time (mean ± σ): 6.483 s ± 0.018 s [User: 5.865 s, System: 0.589 s]
Range (min … max): 6.452 s … 6.505 s 10 runs
Summary
'taskset -c 0 ../scarb_optimized build --workspace' ran
1.11 ± 0.00 times faster than 'taskset -c 0 ../scarb_release build --workspace'
, where in both cases scarb_release
- Release build, scarb_optimized
- PGO-optimized build.
According to the results, we see consistent compilation speed improvement. I expected such results since from my experience all or almost all compilers can benefit a lot from applying PGO.
Further steps
I can suggest the following action points:
- Mention somewhere in the user-visible place that PGO brings measurable performance improvements for the project
- Integrate PGO into the build pipeline (like it's done in CPython or other projects)
- Optimize with PGO prebuilt binaries (if any)
- Test PGO for other Scrab parts: code formatter, LSP server, etc. According to the
awesome-pgo
results, such tools also can be performance-improved with PGO
Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT. However, it's a much less mature optimization technique compared to PGO.
Thank you.
Wow, that's a fascinating result, 10% speed-up on a trivial codebase is pretty astonishing. Sounds very promising also for CairoLS. I have added a task on our issue tracker: starkware-libs/cairo#6476
Thanks so much @zamazan4ik!!! Very interesting stuff! 🚀
We will for sure want to investigate using this technique!
Right now I am failing to reproduce it on both arm mac and x64 fedora, getting llvm/llvm-project#57501 :/ But we will look into this more! :D