Profile-Guided Optimization (PGO) results
zamazan4ik opened this issue · 2 comments
Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with xml-rs
.
I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are here(with a lot of other PGO-related information). That's why I tried to optimize xml-rs
with PGO too.
Test setup
My test setup is:
- Macbook M1 Pro
- macOS Ventura 13.4
- Rustc version:
rustc 1.73.0-nightly (180dffba1 2023-08-14)
xml-rs
version:c6331c97ab9f487c9d0bce52c06364116f5e80d2
commit from themaster
branch
Benchmarks
As a benchmark, I used built-in into the xml-rs
crate benchmarks. For PGO optimization I use cargo-pgo.
Results
Release:
test read ... bench: 22,293 ns/iter (+/- 516)
test read_lots_attrs ... bench: 222,601 ns/iter (+/- 7,186)
test write ... bench: 5,073 ns/iter (+/- 91)
Release + PGO:
test read ... bench: 18,589 ns/iter (+/- 231)
test read_lots_attrs ... bench: 166,501 ns/iter (+/- 3,857)
test write ... bench: 4,439 ns/iter (+/- 48)
Instrumented:
test read ... bench: 32,521 ns/iter (+/- 771)
test read_lots_attrs ... bench: 299,387 ns/iter (+/- 11,805)
test write ... bench: 8,157 ns/iter (+/- 243)
As you see, PGO allows parsing XML faster.
I'm not surprised it helps a lot, the parser is architected the way that spans multiple function calls and enums per byte.
But as a library author I can't do anything with this information. It's something that end users need to enable.
But as a library author I can't do anything with this information. It's something that end users need to enable.
Actually, you can write a note about this kind of performance improvement for the library somewhere in the documentation (even a note in README is completely ok). In this case, the users will be aware of how to improve performance of the library with this optimization technique even with actual numbers.