mosra/magnum

Please help fix the GCC bug

ackelcn opened this issue · 2 comments

I notice that a commit of this project reported a rounding error:fb51f25

As this may be a GCC bug, I have filed this problem to GCC:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105905

Would you please provide more feedback through the url? GCC developers can fully resolve the problem.

mosra commented

Hmm, I didn't think this was anything serious, so I didn't bother looking for the root cause in this commit, and just worked around it in the test.

I investigated a bit more, and ... I don't know what is happening, at all. This code:

Vector2 v{Math::sin(37.0_degf), Math::cos(37.0_degf)};

Utility::print("{:.10}\n", v[1]);
Utility::print("{:.10}\n", (v*v.lengthInverted())[1]);
Utility::print("{:.10}\n", (v/v.length())[1]);

prints the following in a debug build (-march=native -g) and in a release build without -march=native (so just -O3.

0.7986354828
0.7986354828
0.7986354828

However, it prints the following in a -march=native -O3 build.

0.7986354828
0.798635602
0.7986355424

Okay, so I thought it's some optimization kicking in, producing a different result, but then I realized that this code:

Vector2 v{Math::sin(37.0_degf), Math::cos(37.0_degf)};

// Utility::print("{:.10}\n", v[1]);
Utility::print("{:.10}\n", (v*v.lengthInverted())[1]);
Utility::print("{:.10}\n", (v/v.length())[1]);

prints

0.7986354828
0.7986354828

even with -march=native -O3. So, ummm, the v[1] in combination with Utility::print() causes that particular optimization to kick in, and if it's not there, it doesn't optimize anything? If I change Utility::print() to std::printf(), it also stops being strange and prints 0.7986354828 three times. So I suppose there has to be sufficiently complex code around these operations to make some optimization kick in? I tried to look at the disassembly, the "strange" variant has a bunch of FMA calls, the non-strange variant has none, but those calls could also have been somewhere else, I'm not that good at understanding the compiler output.

I tested with GCC 10 as well, and it has the same weird behavior as 11. Unfortunately I don't remember if I was at GCC 10 or 9 before that commit. Clang prints 0.7986354828 always.

For completeness, length() is just std::sqrt(v[0]*v[0] + v[1]*v[1]), lengthInverted() is 1.0f/length(). The sin/cos calls are delegating to std::sin() / std::cos(), multiplying the input with 3.141592654f/180.0f first.

I'm not commenting on the GCC bugtracker, as I don't understand what's happening.

mosra commented

Copying the response from the GCC bugtracker verbatim:

So clang defaults to -ffp-contract=off (maybe on which is actually the same as off for GCC) while GCC defaults to -ffp-contract=fast. And with -march=native, the FMA instruction is enabled which allows GCC to do contractions for some floating point and uses FMA more.

Good to know about this option and the defaults. Might be useful for whoever needs to have the exact same output across different platforms (physics simulation in a networked scenario, for example), but since that's a rather narrow use case I don't see a reason to fiddle with this flag by default.