[Benchmark] 4. C/C++ benchmark code is not fair
Opened this issue · 4 comments
The C++ code of the benchmark in the "[Why learn C++ if I know Python (Toy Example)" is not fair. Because code operate on local variables which do not have any aliases and a
is a regular C-array, C++ compiler can remove for()
loop completely and just set s
into some compile time value.
Thanks! There are two things...
- https://godbolt.org/ with using gcc-7.5.0 with flags: -O3 -Wall --std=c++11 for code snippet "4. C/C++ benchmark":
- Preserves integer to double conversion with CVTSI2SD (for x86_64)
- But really that optimization aspect making get rid of that stack allocated memory flat C array from final binary.
So you're correct it's not fair at least by 50% because e.g. Python does not have compiler optimization mechanisms....
- At the same time there are various compiler optimization tricks that compilers can do:
(https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/resources/mit6_172f18_lec9/)
The absence of compiler in your programmig enviroment is your problem.
====
In that particular piece of code compiler did:
- Replace loaded values with using register.
- All increments happens with using register
- Remove dead code with using local stack variable (a).
So in that execution we used compiler optimization.
Conclusion:
I think t would be nice to demonstrate speed with using "-O0" and "-O3" and elaborate why there is a difference. And highlight that there is a point of view "that benchmark is no fair" due to that C++ use compiler optimization tricks.
At least it worth to note, that C++ can remove a[]
from the binary, because it can :)
I tried to compile that sample without optimization using MSVC++. Application fails to start as it cannot allocate 10MB on the stack
In case if I allocate the a[]
as a global static variable, the code compiled by MSVC is two times slower. In case if I compile the code using GNU C++, that change does not affect performance