Profile-Guided Optimization (PGO) benchmark report
zamazan4ik opened this issue · 2 comments
Hi!
I was interested in optimizing the library's performance even further. I evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places, especially different parsers, I decided to apply it to the project - here are my benchmark results.
Test environment
- Fedora 40
- Linux kernel 6.9.7
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.79
ieee80211-rs
version:master
branch on commit5388202303134c5487a03754c80f18e5f92be563
- Disabled Turbo boost
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with taskset -c 0 cargo bench
command. The PGO training phase is done with taskset -c 0 cargo pgo bench
, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench
.
taskset -c 0
is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).
Results
I got the following results:
- Release: https://gist.github.com/zamazan4ik/9ef6e41b3559cfc240adcb7ccf0bf372
- PGO optimized compared to Release: https://gist.github.com/zamazan4ik/8a11bed6a61d3e22875e2dc84a1c669a
- (just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/ab2c3a81e04dad3866e2ab7a3f9084b7
According to the results, PGO measurably improves the library's performance in many cases.
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks with other datasets (if you are interested enough in it). If it shows improvements - add a note to the documentation (the README file, I guess) about possible improvements in the library's performance with PGO.
- Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference or checking some assembly/LLVM IR differences before and after PGO.
I would be happy to answer your questions about PGO.
P.S. I created the issue just because Discussions are disabled for the repository. It's just a benchmark report, not a bug or smth like that.
Hi, thanks a lot for the write up!
I'm going to take a closer look at this tomorrow and enable discussions. I'd do it now, if I weren't lying in a tent somewhere.
It's pretty obvious, that the library benefits from PGO greatly.
I added some stuff to the README and enabled discussions. Thanks a lot for this optimization!