tensorflow/text

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT

Opened this issue · 1 comments

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on many projects - all current results are available here. According to multiple tests, PGO can help with improving performance in many cases (including libraries like pydantic-core). Trying to optimize the Tensorflow Text library can be beneficial since it could reduce spent CPU time on routines like text preprocessing.

I can suggest the following action points:

  • Perform PGO benchmarks on Tensorflow Text. And if it shows improvements - add a note to the documentation about possible improvements in Tensorflow Text performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Tensorflow Text according to their own workloads if they decide to rebuild Tensorflow Text for their own needs.
  • Optimize pre-built binaries (if it's possible to prepare or collect a good-enough training workload)

Since the Tensorflow Text native part (C++) is the library, I think the Pydantic-core experience can be reused here — also, Clang supports PGO for shared libraries. I think in this case possible to prepare some text preprocessing routines, collect the PGO profiles from them, and then use them as training PGO data.

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Many of the examples above are applications but there should be a huge difference - PGO works well with libraries too.

Text tokenization is not likely to be a bottleneck in any real-world model.