Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) on VTS
zamazan4ik opened this issue · 3 comments
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. E.g. PGO helps with optimizing Envoyproxy. PGO results for other proxies like HAProxy you can be found in the repo above. According to the multiple tests, PGO can help with improving performance in many other cases. Since there are already some performance-oriented requests like #251 - I think trying to apply PGO to the VTS module can be a good thing.
I can suggest the following action points:
- Perform PGO benchmarks on VTS. If it shows improvements - add a note to the documentation about possible improvements in VTS performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize VTS according to their workloads
Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
@zamazan4ik Thanks interesting suggestion.
I consider that such optimize through this module might be limited. At first this module is a kind of nginx module, in short the build process is just nginx one, that might only the optimization of nginx.
We should suggest such build process to nginx developers instead this module as formally.
@zamazan4ik
I could completely make sense what you suggested, it can be optimized the following approach. But I'm not sure how can be improved it such that process.
https://stackoverflow.com/questions/13881292/what-information-does-gcc-profile-guided-optimization-pgo-collect-and-which-op
We also can find the detail of this mechanisms on this paper.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf section7.4.
In generally if it could be improve the performance which you expected, we could be written the following a building process as a tips in README instead of providing the optimized binary. Only we prefer to build such the way at users own risks.
compile with fprofile-generate
% pwd
/home/u5surf/nginx
% CC=gcc ./auto/configure --with-cc-opt='-fprofile-generate -fprofile-dir=./objs' --with-ld-opt='-lgcov' --add-module=../nginx-module-vts
% make
test a several cases
% pwd
/home/u5surf/nginx-module-vts
% sudo PATH=/home/u5surf/nginx/objs:$PATH prove -r t/000.display_html.t
...(during runtime it records coverage data into .gcda files)
recompile with fprofile-use
% pwd
/home/u5surf/nginx
% CC=gcc ./auto/configure --with-cc-opt='-fprofile-use -fprofile-dir=../nginx-module-vts/objs' --with-ld-opt='-lgcov' --add-module=../nginx-module-vts
% make
Excuse me for the so late response - holidays, you know :)
In generally if it could be improve the performance which you expected, we could be written the following a building process as a tips in README instead of providing the optimized binary. Only we prefer to build such the way at users own risks.
I agree with your suggestion. Having such documentation somewhere (like in the README file) is a good thing to the users to have.
I have the following suggestions for your documentation about PGO:
- Clang also supports PGO so there is no need to bind the PGO documentation only to the GCC compiler. PGO with Clang can be enabled with the same
-fprofile-generate
/-fprofile-use
flags. - It would be great if in the documentation you put information about the actual performance wins with PGO in the VTS module. But for that, we need to perform some benchmarks.
Here I gathered some PGO-related documentation examples:
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback"
- Clang:
- Rustc: https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
- tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md