Profile-Guided Optimization (PGO) results

Question

Profile-Guided Optimization (PGO) results

zamazan4ik opened this issue a year ago · 2 comments

Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with xml-rs.

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are here(with a lot of other PGO-related information). That's why I tried to optimize xml-rs with PGO too.

Test setup

My test setup is:

Macbook M1 Pro
macOS Ventura 13.4
Rustc version: rustc 1.73.0-nightly (180dffba1 2023-08-14)
xml-rs version: c6331c97ab9f487c9d0bce52c06364116f5e80d2 commit from the master branch

Benchmarks

As a benchmark, I used built-in into the xml-rs crate benchmarks. For PGO optimization I use cargo-pgo.

Results

Release:

test read            ... bench:      22,293 ns/iter (+/- 516)
test read_lots_attrs ... bench:     222,601 ns/iter (+/- 7,186)
test write           ... bench:       5,073 ns/iter (+/- 91)

Release + PGO:

test read            ... bench:      18,589 ns/iter (+/- 231)
test read_lots_attrs ... bench:     166,501 ns/iter (+/- 3,857)
test write           ... bench:       4,439 ns/iter (+/- 48)

Instrumented:

test read            ... bench:      32,521 ns/iter (+/- 771)
test read_lots_attrs ... bench:     299,387 ns/iter (+/- 11,805)
test write           ... bench:       8,157 ns/iter (+/- 243)

As you see, PGO allows parsing XML faster.

Answer 1 · 2023-08-19T12:17:41.000Z

I'm not surprised it helps a lot, the parser is architected the way that spans multiple function calls and enums per byte.

But as a library author I can't do anything with this information. It's something that end users need to enable.

Answer 2 · 2023-08-19T12:35:53.000Z

But as a library author I can't do anything with this information. It's something that end users need to enable.

Actually, you can write a note about this kind of performance improvement for the library somewhere in the documentation (even a note in README is completely ok). In this case, the users will be aware of how to improve performance of the library with this optimization technique even with actual numbers.