kaist-cp/cs431

[HW4] (Thread Sanitizer has been running for 10 minutes)

Closed this issue · 6 comments

My solution passes the test cases invoked with cargo, but when it comes to cargo_tsan, in my previous run, it was running for more than 30 minutes printing some messages, and I killed the process, then invoked it again with the hope of seeing it doesn't take this much time. But, apparently, for this time as well, it has been running for more than 10 minutes already.
I was able to read some warning messages:

WARNING: ThreadSanitizer: data race (pid=1183058)
Atomic write of size 8 at 0x720c00005a58 by thread T35:
Previous write of size 8 at 0x720c00005a58 by thread T32:

  1. How can an atomic write result in a data race?
  2. Why does cargo_tsan run so long? It looked like it was in a loop
  3. What should I do to solve a data race pointed at by thread sanitiser?

Thanks in Advance!

  1. It can be racy if you are mixing it with non atomic accesses. The error message also suggests that (note the addr is the same)
  2. TSAN does typically take longer.... not sure why the loop happens tho.
  3. Do you have llvm-symbolizer enabled? That should tell where the racy accesses are happening, and you should look into those.

Regarding question 2, taking too much time in testing can be a strong signal that your code can result in deadlock. Even if your code passes other tests with cargo or cargo_asan, unlucky scheduling with cargo_tsan may result in deadlock. Note that our model solution takes at most 1 min for testing with cargo_tsan, so testing should not take that much long as in your case (>30min).

  1. Yes, I know that they have the same address, and it says Atomic write, and when I see the backtrace, indeed the first function in the backtrace was an atomic operation. I am confused about this.
  2. It is version 14. And I executed the following command shown in manual: sudo ln -s /usr/bin/llvm-symbolizer-14 /usr/bin/llvm-symbolizer But I don't see helpful messages though
  1. It can be racy if you are mixing it with non atomic accesses. The error message also suggests that (note the addr is the same)
  2. TSAN does typically take longer.... not sure why the loop happens tho.
  3. Do you have llvm-symbolizer enabled? That should tell where the racy accesses are happening, and you should look into those.

this is one of the summaries generated by sanitiser:

SUMMARY: ThreadSanitizer: data race /home/ubuntu/.rustup/toolchains/nightly-2024-03-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:3361:23 in core::sync::atomic::atomic_sub::h5c8653b27dead1c1

Is this how it is supposed to print after llvm-symbolizer is enabled?
Screen Shot 2024-04-11 at 21 10 13

Yes that looks correct. In terms of the data race, what I meant is that from the summary you posted:

WARNING: ThreadSanitizer: data race (pid=1183058)
Atomic write of size 8 at 0x720c00005a58 by thread T35:
Previous write of size 8 at 0x720c00005a58 by thread T32:

The two writes by T32 and T35 are racing, so it's probably the case that the write by T32 is non-atomic.

Thanks! This helped to debug it!