feos-org/feos

Performance improvements

Opened this issue · 0 comments

Following some guidelines of the Rust Performance Book here are some things we can try to improve performance:

  • Add codegen-units = 1 to release build
  • Use a faster allocator. E.g. mimalloc works on all operating systems

Not so easy:

  • properly profile to identify hot parts
  • remove clones/allocations where not needed
  • use profile-guided optimization (e.g. via cargo-pgo)
    • unfortunately this is currently not working with LTO and the PGO version is 10-20% slower than LTO
    • might be available in the future in maturin directly, see here

Quick tests with codegen-units = 1 added to release-lto (see here) show performance improvements of benchmarks of up to 12% (mean is about 7%) while for dual_number, changes are a bit smaller (see below).

Proper benchmarks (across all benchmarks) with comparison to current release workflow are needed but this might be an easy-to-get improvement if it turns out to be faster for all cases.

  • Benchmark: dual_numbers
  • System: methane/CO2
  • main: main branch + lto
  • main_codegen: main branch + lto + codegen-units = 1
  • develop_: like main

Execution times in µs

name f64 dual dual2 hyperdual dual3
main 1.1382 1.2325 1.4539 1.6267 1.7563
main_codegen 1.0229 1.1741 1.3708 1.5777 1.6316
develop 1.0138 1.1989 1.4465 1.589 1.7549
develop_codegen 0.9761 1.1681 1.4195 1.5446 1.6304

Slowdown t_f64/t_d for each branch/option

f64 dual dual2 hyperdual dual3
main 1 1.08285 1.27737 1.42919 1.54305
main_codegen 1 1.14782 1.34011 1.54238 1.59507
develop 1 1.18258 1.42681 1.56737 1.73101
develop_codegen 1 1.1967 1.45426 1.58242 1.67032

Relative difference in % w.r.t. main + lto for each dual number (t_d_branch - t_d_main) / t_d_main * 100

name f64 dual dual2 hyperdual dual3
main_codegen -10.13 -4.74 -5.72 -3.01 -7.10
develop -10.93 -2.73 -0.51 -2.32 -0.08
develop_codegen -14.24 -5.23 -2.37 -5.05 -7.17