Running with all optimizations on is slower than running with the default optimizations
developedby opened this issue · 4 comments
Reported by @Janiczek on discord
The program chosen is the README.md example, with sum(25,0) in the main.
CPU: AMD Ryzen 7 5800X3D (8 cores, 16 logical processors, 3.40 GHz)
GPU: NVIDIA GeForce RTX 3070 Ti
Measurements:
run -O all: 17.51s
run-c -O all: 22.74s
run-cu -O all: 3.44s
run: 16.40s
run-c: 3.47s
run-cu: 0.73s
Details:
run -O all
$ time bend run -s -O all sample.bend
Result: 0
- ITRS: 805306351
- TIME: 17.51s
- MIPS: 46.00
16.81user 0.70system 0:17.51elapsed 100%CPU (0avgtext+0avgdata 6294448maxresident)k
0inputs+8outputs (0major+4418minor)pagefaults 0swaps
run-c -O all
$ time bend run-c -s -O all sample.bend
Result: 0
- ITRS: 805306351
- TIME: 22.72s
- MIPS: 35.44
92.79user 270.53system 0:22.74elapsed 1597%CPU (0avgtext+0avgdata 437532maxresident)k
0inputs+8outputs (13major+98934minor)pagefaults 0swaps
run-cu -O all
$ time bend run-cu -s -O all sample.bend
Result: 0
- ITRS: 803897327
- LEAK: 33718271
- TIME: 2.44s
- MIPS: 329.46
2.35user 0.09system 0:03.44elapsed 71%CPU (0avgtext+0avgdata 337556maxresident)k
41608inputs+3400outputs (251major+59918minor)pagefaults 0swaps
run (without -O all)
$ time bend run -s sample.bend
Result: 0
- ITRS: 738197489
- TIME: 16.39s
- MIPS: 45.03
15.55user 0.84system 0:16.40elapsed 99%CPU (0avgtext+0avgdata 6294392maxresident)k
0inputs+8outputs (0major+4419minor)pagefaults 0swaps
run-c (without -O all)
$ time bend run-c -s sample.bend
Result: 0
- ITRS: 738197489
- TIME: 3.36s
- MIPS: 219.73
34.47user 19.35system 0:03.47elapsed 1550%CPU (0avgtext+0avgdata 5297008maxresident)k
0inputs+8outputs (13major+1318553minor)pagefaults 0swaps
run-cu (without -O all)
$ time bend run-cu -s sample.bend
Result: 0
- ITRS: 803897327
- LEAK: 33718271
- TIME: 0.40s
- MIPS: 1997.91
0.37user 0.03system 0:00.73elapsed 55%CPU (0avgtext+0avgdata 105984maxresident)k
13272inputs+8outputs (82major+4739minor)pagefaults 0swaps
Note this was on very early versions of bend+hvm right after release. Might be worth first checking this still happens.
I think I haven't changed anything significant about the default transformations since then, so it's likely that it still happens
The hvm program generated with or without -Oall is exactly the same for this bend program, so the issue is somewhere in HVM.
This is the code that is mentioned in this issue (the readme has changed since then):
def sum(depth, x):
switch depth:
case 0:
return x
case _:
fst = sum(depth-1, x*2+0) # adds the fst half
snd = sum(depth-1, x*2+1) # adds the snd half
return fst + snd
def main:
return sum(25, 0)
Moving it to this HVM issue since it's not a bend problem HigherOrderCO/HVM#378