quickwit-oss/tantivy

Profile-Guided Optimization (PGO) evaluation results

Opened this issue · 2 comments

Hi!

Recently I did a lot of benchmarks for measuring Profile-Guided Optimization (PGO) effects on different projects (including some libraries) - the results are available here. So I decided to test PGO with tantivy as well.

My test setup is a Macbook M1 Pro, macOS 13.4 Ventura. All tests are done on the same hardware. Rust version - 1.72. The PGO optimization is done with cargo-pgo. As a training and evaluation set, I use Tantiv benchmarks. The background load was kept the same (as much as I can guarantee on macOS, ofc). The results are the following (in the cargo bench output format):

This information can be helpful:

  • For the Tantivy users who want to optimize their applications
  • For benchmark purposes as an additional way to extract more performance

Probably would be a good idea to mention PGO somewhere in the Tantivy documentation/README/Wiki .

Thanks for these investigations, the results are quite interesting.
It seems you ran the benchmarks with cargo bench. The full suite is behind a feature flag:
cargo +nightly bench --features unstable

The numbers suggest that the default compilation has a lot of leeway.
I'd like to annotate the rust code to replicate the resulting binary, but as far as I know there's no interface to llvm for that, except inline.

It seems you ran the benchmarks with cargo bench. The full suite is behind a feature flag:
cargo +nightly bench --features unstable

Didn't know that! Would be a good idea to test PGO on the full benchmark set.

I'd like to annotate the rust code to replicate the resulting binary, but as far as I know there's no interface to llvm for that, except inline.

Well, it can be quite difficult to maintain, to be honest. That's why PGO shines here - all "annotation" machinery is done by a compiler, not by a human. As you mentioned above, not all optimizations done by PGO can be replicated in the source code via different LLVM attributes. A human annotation, however, has a benefit since it does not require a double compilation.