Figure out why `test (256)` keeps failing

Question

Figure out why `test (256)` keeps failing

Closed this issue 4 months ago · 5 comments

Sometimes (only sometimes!) it gets stuck in an infinite loop of printing out address sanitizer errors. Very concerning, but only seems to happen for the 256-avx case so far. This one is currently not allowed to go into the python wheels due to a previous mysterious crash, so this isn't urgent, but it sure would be nice to know why these things are occurring.

Answer 1 · 2024-03-15T21:49:05.000Z

Nevermind it has now also happened for test (64) and is therefore high priority.

Answer 2 · 2024-03-15T23:49:27.000Z

Based on #718 this is a bug in gtest rather than a bug in stim. Reported it in google/googletest#4491 .

Answer 3 · 2024-03-17T21:30:15.000Z

I have the same bug (AddressSanitizer:DEADLYSIGNAL) in my primecount project in the code below which only uses the math functions from the C++ standard library (my project does not use googletest):

  for (int i = 0; i < 100; i++)
  {
    T term = (Li(t) - x) * std::log(t);

    // Not converging anymore
    if (std::abs(term) >= std::abs(old_term))
      break;

    t -= term;
    old_term = term;
  }

My bug only occurs on Ubuntu 22.04 & 23.10 (x64) when running in a virtual machine and enabling the GCC/Clang sanitizers. When I switched my CI test to ubuntu-20.04 the bug disappeared. (When I tested using Ubuntu 22.04 & GCC sanitizers on a real server (no VM or Docker container) it also works without any issues)

After more than 2 hours of debugging I couldn't figure out the exact cause of the issue, but it looks like the issue is caused by a Ubuntu >= 22.04 bug or a compiler/sanitizer bug.

UPDATE 18/03/2024: Today I also tested on a Fedora 36 x64 VM using GCC and the same compiler options but I was not able to reproduce the issue. Hence the issue seems to only occur on Ubuntu x64 VMs (and possibly also on Debian VMs).

Answer 4 · 2024-03-17T22:30:19.000Z

@kimwalisch Thanks, that's very helpful to know that I can work around it by pinning the version of ubuntu used by CI.

Answer 5 · 2024-03-27T22:45:08.000Z

This seems to have been resolved externally. Hasn't happened in a PR for about a week now, whereas before it was happening multiple times per PR.