Tencent/tquic

Profile-Guided Optimization (PGO) performance results

zamazan4ik opened this issue · 1 comments

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. I think since the project is performance-oriented, it would be interesting to try to test PGO for optimizing tquic. I already did some benchmarks.

Test environment

  • Fedora 38
  • Linux kernel 6.5.5
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.73
  • tquic version: the latest for now from the develop branch on commit 05c56e7425ec1149a9c95ca7bbcb6acbab861fd6
  • Disabled Turbo boost

Benchmark setup

For benchmarking purposes, I use the project's benchmarks. Release benchmarking is done with cargo bench, PGO optimized build is done with cargo-pgo with cargo pgo bench && cargo pgo optimize bench. PGO profiles are collected from the benchmark workload itself.

Results

I got the following results:

According to the tests, PGO consistently improves tquic performance in some scenarios.

Further steps

I can suggest the following things to do:

  • Evaluate PGO's applicability to tquic in more scenarios.
  • If PGO helps to achieve better performance - add a note to tquic's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for tquic.
  • Maybe get some insights from the PGO profiles and optimize manually the code according to the profiles (maybe more aggressive inlining or something like that)

@zamazan4ik Thank you for your suggestion. I truly appreciate your expertise in performance optimization and your recommendation to use Profile-Guided Optimization (PGO).

We will consider this approach and assess the benefits and potential impact on our production environment. I would love to discuss this further with you when we make new progress. Your insights would be invaluable to us.

Thank you again for your invaluable suggestion.