google-deepmind/mujoco

Threading issue reported by TSAN when `usethread=1`

Closed this issue · 4 comments

Intro

Hello MuJoCo team,

I encountered a threading issue reported by TSAN while parsing an XML file.

My setup

Mujoco 3.2.2 / C Api / Ubuntu 22.04 / x86_64

What's happening? What did you expect?

While parsing an xml file with TSAN enabled, there is a data race warning as following:

==================
WARNING: ThreadSanitizer: data race (pid=4072513)
  Atomic read of size 1 at 0x7b2c00000000 by thread T2:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240 (libtsan.so.0+0x53908)
    #1 std::__1::mutex::lock() <null> (libmujoco.so.3.2.2+0x105eb5)

  Previous write of size 8 at 0x7b2c00000000 by thread T1:
    #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:655 (libtsan.so.0+0x31c57)
    #1 operator new(unsigned long) <null> (libmujoco.so.3.2.2+0x102ff3)

  Location is heap block of size 168 at 0x7b2c00000000 allocated by thread T1:
    #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:655 (libtsan.so.0+0x31c57)
    #1 operator new(unsigned long) <null> (libmujoco.so.3.2.2+0x102ff3)

  Thread T2 (tid=4072516, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 void std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::construct[abi:v170001]<std::__1::thread, mjCModel::CompileMeshes(mjVFS_ const*)::$_0, void>(std::__1::allocator<std::__1::thread>&, std::__1::thread*, mjCModel::CompileMeshes(mjVFS_ const*)::$_0&&) <null> (libmujoco.so.3.2.2+0x2d3460)
    #2 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f)

  Thread T1 (tid=4072515, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 void std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::construct[abi:v170001]<std::__1::thread, mjCModel::CompileMeshes(mjVFS_ const*)::$_0, void>(std::__1::allocator<std::__1::thread>&, std::__1::thread*, mjCModel::CompileMeshes(mjVFS_ const*)::$_0&&) <null> (libmujoco.so.3.2.2+0x2d3460)
    #2 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f)

SUMMARY: ThreadSanitizer: data race (/home/BOSDYN/jahn/Downloads/mujoco-3.2.2-linux-x86_64/mujoco-3.2.2/lib/libmujoco.so.3.2.2+0x105eb5) in std::__1::mutex::lock()
==================
ThreadSanitizer: reported 1 warnings

Steps for reproduction

  1. Download mujoco-3.2.2-linux-x86_64.tar.gz
  2. Clone https://github.com/bd-jahn/mujoco-binary-tsan
  3. Build main.cpp with TSAN: g++ -o main main.cpp -Ipath_to_header -Lpath_to_libmujoco -lmujoco -fsanitize=thread
  4. Run the binary: LD_LIBRARY_PATH=paht_to_libmujoco ./main
  5. TSAN will report a data race warning.

Minimal model for reproduction

I created a repo to reproduce the issue. Please see the above step to reproduce.

Code required for reproduction

I created a repo to reproduce the issue. Please see the above step to reproduce.

Confirmations

Thanks. I can confirm that following these instructions exactly reproduces the issue.

I think this is a false positive, due to the MuJoCo library not being built with -fsanitize=thread. See the ThreadSanitizer docs:

https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code

We do test MuJoCo with TSAN, and I wasn't able to reproduce the issue when using exactly the same code and model, but building everything with TSAN. Please reopen if you've seen this with MuJoCo compiled with ThreadSanitizer.

Even though this bug report is evidently false positive (since you didn't build from source), let's take a step back. Presumably there was some symptom that made you suspect MuJoCo's threading which went away when you disabled threading? What was it?

@nimrod-gileadi Thanks for investigating this issue. I noticed that when I build MuJoCo from source, the threading error no longer appears. However, I wasn't aware of the false positive issue.

@yuvaltassa Not really. I just tested our code (dynamically linking the MuJoCo binary) with TSAN and found this issue. With TSAN disabled, I haven't noticed any suspicious symptoms so far.