[Benchmark] 4. C/C++ benchmark code is not fair

Question

[Benchmark] 4. C/C++ benchmark code is not fair

Opened this issue 2 years ago · 4 comments

The C++ code of the benchmark in the "[Why learn C++ if I know Python (Toy Example)" is not fair. Because code operate on local variables which do not have any aliases and a is a regular C-array, C++ compiler can remove for() loop completely and just set s into some compile time value.

Answer 1 · 2022-08-27T20:09:41.000Z

Thanks! There are two things...

https://godbolt.org/ with using gcc-7.5.0 with flags: -O3 -Wall --std=c++11 for code snippet "4. C/C++ benchmark":

Preserves integer to double conversion with CVTSI2SD (for x86_64)
But really that optimization aspect making get rid of that stack allocated memory flat C array from final binary.

So you're correct it's not fair at least by 50% because e.g. Python does not have compiler optimization mechanisms....

At the same time there are various compiler optimization tricks that compilers can do:
(https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/resources/mit6_172f18_lec9/)

The absence of compiler in your programmig enviroment is your problem.

====

In that particular piece of code compiler did:

Replace loaded values with using register.
All increments happens with using register
Remove dead code with using local stack variable (a).
So in that execution we used compiler optimization.

Conclusion:
I think t would be nice to demonstrate speed with using "-O0" and "-O3" and elaborate why there is a difference. And highlight that there is a point of view "that benchmark is no fair" due to that C++ use compiler optimization tricks.

Answer 2 · 2022-08-27T20:17:51.000Z

At least it worth to note, that C++ can remove a[] from the binary, because it can :)

Answer 3 · 2022-08-30T08:31:43.000Z

I tried to compile that sample without optimization using MSVC++. Application fails to start as it cannot allocate 10MB on the stack

Answer 4 · 2022-08-30T08:35:07.000Z

In case if I allocate the a[] as a global static variable, the code compiled by MSVC is two times slower. In case if I compile the code using GNU C++, that change does not affect performance