Compiler optimization deletes the bodies_energy function call in the nbody benchmark.
kohnakagawa opened this issue · 3 comments
In the nbody benchmark, bodies_energy
function returns the total energy, but this return value is not used in the benchmark_body
function. Since bodies_energy
function is a pure function, the function call of this can be omitted via a compiler optimization when the return value is not used.
I noticed that the clang compiler (Apple clang version 11.0.0 (clang-1100.0.33.8)
) can detect this fact, and delete the function call of bodies_energy
completely.
I attach the disassembly result of the binary compiled in the following compile flag.
# compiler option
-O2 -march=native -Wall -Wextra -fdata-sections -ffunction-sections
; disassembly result
; only function call of offset_momentum exits.
_benchmark_body:
1000017a0: 55 pushq %rbp
1000017a1: 48 89 e5 movq %rsp, %rbp
1000017a4: 41 56 pushq %r14
1000017a6: 53 pushq %rbx
1000017a7: 85 ff testl %edi, %edi
1000017a9: 7e 27 jle 39 <_benchmark_body+0x32>
1000017ab: 89 fb movl %edi, %ebx
1000017ad: 4c 8d 35 7c 08 00 00 leaq 2172(%rip), %r14
1000017b4: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
1000017be: 66 90 nop
1000017c0: 4c 89 f7 movq %r14, %rdi
1000017c3: be 05 00 00 00 movl $5, %esi
1000017c8: e8 b3 fd ff ff callq -589 <_offset_momentum>
1000017cd: 83 c3 ff addl $-1, %ebx
1000017d0: 75 ee jne -18 <_benchmark_body+0x20>
1000017d2: 5b popq %rbx
1000017d3: 41 5e popq %r14
1000017d5: 5d popq %rbp
1000017d6: c3 retq
1000017d7: 66 0f 1f 84 00 00 00 00 00 nopw (%rax,%rax)
This deletion of bodies_energy
function call is problematic because the main computational operations in the nbody benchmark are not performed. So, I think this issue should be fixed.
I think that one of the solutions to this issue is to add an assignment of the bodies_energy
function return value to the global variable. The following code is such an example.
static double energy = 0.0;
static int __attribute__ ((noinline))
benchmark_body (int rpt)
{
int j;
for (j = 0; j < rpt; j++)
{
int i;
offset_momentum (solar_bodies, BODIES_SIZE);
/*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
for (i = 0; i < 100; ++i)
energy = bodies_energy (solar_bodies, BODIES_SIZE); // <-- assign the return value to global variable `energy`.
/*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
}
return 0;
}
Any comments on this issue? Thanks.
This should have been resolved by pull request 23, which I've now merged. Let me know if this works correctly for you.
Hi @kohnakagawa
Good catch. This is an error in the wrapping of the function. I'll work on a fix for this one. I imagine most modern compilers will optimize this away.
Best wishes,
Jeremy
Thank you for your reply.
I checked that the deletion of the bodies_energy function call is prohibited by this fix.
However, I think some points of your pull request should be fixed. I make pull request #24 to fix these points.