codegen-units=1 + LTO causes 3-5% performance regression for sequential code

Question

codegen-units=1 + LTO causes 3-5% performance regression for sequential code

Opened this issue a day ago · 1 comments

Code

I tried this code:

https://github.com/uutils/coreutils/blob/b2d0773356063da0d8cade4d7d14b5392df75556/src/uu/seq/src/seq.rs#L374-L385

The bug trigger:

Line 374: BigUint variable
    ↓
Line 377: Loop (1M+ iterations)
    ↓
Line 383: Arithmetic operation
    ↓
With codegen-units=1 + LTO
    ↓
LLVM over-inlines Line 383
    ↓
Register pressure (16 GPRs on x86_64)
    ↓
Stack spilling
    ↓
-5% performance regression

I expected to see this happen: "may improve performance"

Instead, this happened: 3-5% slower

uutils/coreutils#9161:

seq_integers: -5.06% (26.1ms → 27.5ms)
seq_with_step: -4.98% (13.3ms → 14.0ms)
expand_custom_tabstops: -2.73% (36.6ms → 37.6ms)
cut_fields_custom_delim: +32.29% (40.7ms → 30.8ms)
cut_fields_tab: +26.13% (34.1ms → 27.0ms)
Overall: -10.02% (22 improvements, 10 regressions)

Code

Related