N-Dekker/noexcept_benchmark

Reordering in gcc and clang

HDembinski opened this issue · 8 comments

Hi, I saw your lightning talk. I think the reason why you didn't see an effect of noexcept in the reordering test for gcc and clang is that these compilers are too clever. They were able to prove that the increment and decrement have no observable effect and removed them, irregardless of the throw.

You can see this here in a greatly simplified version:
https://godbolt.org/z/M-yu-e

The increment is only carried out, when it has an observable side-effect. For example, because the value is reported as part of the exception:
https://godbolt.org/z/e52Wq3

Very interesting, thank you Hans @HDembinski I'll have a closer look this weekend!

For the record, https://godbolt.org/z/M-yu-e , the first example by Hans Dembinski, says:

template <bool Throw>
void maybe_throw() noexcept(!Throw);

unsigned foo(unsigned value) {

    ++value;
    maybe_throw<true>();
    --value;

    return value;
}

And yields for gcc 9.3 /O3

foo(unsigned int):
    push    r12
    mov     r12d, edi
    call    void maybe_throw<true>()
    mov     eax, r12d
    pop     r12

While his second example https://godbolt.org/z/e52Wq3 says:

template <bool Throw>
void maybe_throw(unsigned value) noexcept(!Throw) {
    if (Throw) throw value; 
}

unsigned foo(unsigned value) {

    ++value;
    maybe_throw<true>(value);
    --value;

    return value;
}

Which yields:

foo(unsigned int):
    push    rbx
    mov     ebx, edi
    mov     edi, 4
    inc     ebx
    call    __cxa_allocate_exception
    xor     edx, edx
    mov     esi, OFFSET FLAT:_ZTIj
    mov     DWORD PTR [rax], ebx
    mov     rdi, rax
    call    __cxa_throw

They were able to prove that the increment and decrement have no observable effect and removed them, irregardless of the throw.

Sorry for the delay, @HDembinski But I don't entirely understand what you mean. Do you think GCC can figure out that func(do_throw_exception) never throws, even when it has an implicit exception specifier (equivalent to noexcept(false))? https://github.com/N-Dekker/noexcept_benchmark/blob/v03/lib/inc_and_dec_test.cpp#L24

You see, the argument do_throw_exception is initialized by a volatile variable (whose value is false), so it should not optimize away this initialization.

https://github.com/N-Dekker/noexcept_benchmark/blob/v03/lib/inc_and_dec_test.cpp

What I meant is if you change my code to

template <bool Throw>
void maybe_throw(unsigned value) noexcept(!Throw) {
    if (Throw) throw value; 
}

unsigned foo(unsigned value) {

    ++value;
    maybe_throw<true>(1);
    --value;

    return value;
}

Then the increment and decrement of value are removed whether you throw or not. I was wondering whether the same happens here:

Thanks for your clarification, Hans. But if the func(do_throw_exception) might throw, then if that would happen, value would end up being 1. Which would affect the runtime behavior at:

So I don't see how the increment and decrement of value could be removed by GCC, unless they assume that func(do_throw_exception) never throws.

I guess you are right, feel free to close.

@HDembinski Thanks for allowing me to close the issue, but I still don't allow myself to do so! Not yet! I'm still puzzled by the fact that in this case (my "inc_and_dec_test.cpp" test), I don't see a performance effect from GCC. Even though I do see that the https://godbolt.org assembler output gets somewhat simpler when adding noexcept in such a case. (Disclaimer: I'm no assembly programmer!)

I'll have a closer look later this week.

Update: I'm now looking at the following simplification of my "inc_and_dec_test.cpp" test:

struct exception {};

namespace
{
  inline void throw_exception_if(const bool do_throw_exception)
  {
    if (do_throw_exception)
    {
      throw exception{};
    }
  }

  void func(const bool do_throw_exception) OPTIONAL_EXCEPTION_SPECIFIER
  {
    throw_exception_if(do_throw_exception);
  }
}


int test_inc_and_dec()
{
  int value = 0;
  volatile bool volatile_false = false;

  try
  {
    for (int i = 0; i < 2147483647; ++i)
    {
      const bool do_throw_exception = volatile_false;
      ++value;
      func(do_throw_exception);
      --value;
    }
  }
  catch (const exception&)
  {
  }
  return value;
}

https://godbolt.org/z/eXNk_C (26 lines ASM) is for -O3 -DOPTIONAL_EXCEPTION_SPECIFIER=noexcept
https://godbolt.org/z/f9UWX6 (37 lines ASM) is for -O3 -DOPTIONAL_EXCEPTION_SPECIFIER= (implicit exception specifier)

Now I'm still figuring out what's going on, exactly. Please let me know if you have any suggestion, how to adjust the test in order to get an observable performance effect from GCC out of adding noexcept!

Looking at the actual assembly of those two cases, I can still see that the compiler is too clever.

It concluded that incrementing and decremented value inside the loop is useless in either case. It inlined the body of func and throw_exception_if, because those are visible to it. Whether the throw-path is taken boils down to an if embedded in throw_exception_if in both cases, this if became embedded in the loop and is the only the only thing executed in the loop. So in the loop, it just checks a gazillion times whether that if triggers the false branch or not not. The compiler deduced that the result of the function is zero, if it never triggers. It also deduced that the result of the function is 1 if it triggers and you do not use noexcept on func. If you apply noexcept and the throw triggers, then it simply terminates and never returns.

In summary, there is no difference at runtime because the compiler optimized out the increment and decrement in either case.