embench/embench-iot

Compiler optimization deletes the bodies_energy function call in the nbody benchmark.

kohnakagawa opened this issue · 3 comments

In the nbody benchmark, bodies_energy function returns the total energy, but this return value is not used in the benchmark_body function. Since bodies_energy function is a pure function, the function call of this can be omitted via a compiler optimization when the return value is not used.

I noticed that the clang compiler (Apple clang version 11.0.0 (clang-1100.0.33.8)) can detect this fact, and delete the function call of bodies_energy completely.

I attach the disassembly result of the binary compiled in the following compile flag.

# compiler option
-O2 -march=native -Wall -Wextra -fdata-sections -ffunction-sections
; disassembly result
; only function call of offset_momentum exits. 
_benchmark_body:
1000017a0:	55 	pushq	%rbp
1000017a1:	48 89 e5 	movq	%rsp, %rbp
1000017a4:	41 56 	pushq	%r14
1000017a6:	53 	pushq	%rbx
1000017a7:	85 ff 	testl	%edi, %edi
1000017a9:	7e 27 	jle	39 <_benchmark_body+0x32>
1000017ab:	89 fb 	movl	%edi, %ebx
1000017ad:	4c 8d 35 7c 08 00 00 	leaq	2172(%rip), %r14
1000017b4:	66 2e 0f 1f 84 00 00 00 00 00 	nopw	%cs:(%rax,%rax)
1000017be:	66 90 	nop
1000017c0:	4c 89 f7 	movq	%r14, %rdi
1000017c3:	be 05 00 00 00 	movl	$5, %esi
1000017c8:	e8 b3 fd ff ff 	callq	-589 <_offset_momentum>
1000017cd:	83 c3 ff 	addl	$-1, %ebx
1000017d0:	75 ee 	jne	-18 <_benchmark_body+0x20>
1000017d2:	5b 	popq	%rbx
1000017d3:	41 5e 	popq	%r14
1000017d5:	5d 	popq	%rbp
1000017d6:	c3 	retq
1000017d7:	66 0f 1f 84 00 00 00 00 00 	nopw	(%rax,%rax)

This deletion of bodies_energy function call is problematic because the main computational operations in the nbody benchmark are not performed. So, I think this issue should be fixed.

I think that one of the solutions to this issue is to add an assignment of the bodies_energy function return value to the global variable. The following code is such an example.

static double energy = 0.0;

static int __attribute__ ((noinline))
benchmark_body (int rpt)
{
  int j;

  for (j = 0; j < rpt; j++)
    {
      int i;
      offset_momentum (solar_bodies, BODIES_SIZE);
      /*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
      for (i = 0; i < 100; ++i)
	energy = bodies_energy (solar_bodies, BODIES_SIZE); // <-- assign the return value to global variable `energy`.
      /*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
    }
  return 0;
}

Any comments on this issue? Thanks.

This should have been resolved by pull request 23, which I've now merged. Let me know if this works correctly for you.

Hi @kohnakagawa

Good catch. This is an error in the wrapping of the function. I'll work on a fix for this one. I imagine most modern compilers will optimize this away.

Best wishes,

Jeremy

Hi @jeremybennett

Thank you for your reply.

I checked that the deletion of the bodies_energy function call is prohibited by this fix.
However, I think some points of your pull request should be fixed. I make pull request #24 to fix these points.