burlachenkok/CPP_from_1998_to_2020

[Benchmark] 4. C/C++ benchmark code is not fair

Opened this issue · 4 comments

ujos commented

The C++ code of the benchmark in the "[Why learn C++ if I know Python (Toy Example)" is not fair. Because code operate on local variables which do not have any aliases and a is a regular C-array, C++ compiler can remove for() loop completely and just set s into some compile time value.

Thanks! There are two things...

  1. https://godbolt.org/ with using gcc-7.5.0 with flags: -O3 -Wall --std=c++11 for code snippet "4. C/C++ benchmark":
  • Preserves integer to double conversion with CVTSI2SD (for x86_64)
  • But really that optimization aspect making get rid of that stack allocated memory flat C array from final binary.

So you're correct it's not fair at least by 50% because e.g. Python does not have compiler optimization mechanisms....

  1. At the same time there are various compiler optimization tricks that compilers can do:
    (https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/resources/mit6_172f18_lec9/)

The absence of compiler in your programmig enviroment is your problem.

====

In that particular piece of code compiler did:

  • Replace loaded values with using register.
  • All increments happens with using register
  • Remove dead code with using local stack variable (a).
    So in that execution we used compiler optimization.

Conclusion:
I think t would be nice to demonstrate speed with using "-O0" and "-O3" and elaborate why there is a difference. And highlight that there is a point of view "that benchmark is no fair" due to that C++ use compiler optimization tricks.

ujos commented

At least it worth to note, that C++ can remove a[] from the binary, because it can :)

ujos commented

I tried to compile that sample without optimization using MSVC++. Application fails to start as it cannot allocate 10MB on the stack

ujos commented

In case if I allocate the a[] as a global static variable, the code compiled by MSVC is two times slower. In case if I compile the code using GNU C++, that change does not affect performance