roastduck/FreeTensor

[Regression] Likely data race after #161

roastduck opened this issue · 1 comments

After #161, there seems to be a data race, which cause Driver unable to be constructed concurrently.

PS. Driver's constructor should be thread-safe, so we can call the backend compiler in parallel. Currently, all the function calls from Driver's constructor are thread-safe.

To reproduce, change test/30.schedule/test_auto_fuse.py:114 (the second parameter of s.tune_auto_schedule in the test test_tune) from 1 to 50. Run the test, and you will see an error message like

Inconsistency detected by ld.so: dl-open.c: 881: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

This means some data structure inside dlopen is corrupted.

It works one commit (52b1627) before #161, but fails since #161.

Valgrind's thread analyzer suggest that it may be some co-influences among fork, dlopen and OpenMP, but can be other reasons since there are a lot of false positive from Valgrind.

Here is one of the stack traces from Valgrind:

==624918== Thread #79 unlocked lock at 0x402B968 currently held by thread #73
==624918==    at 0x48439F7: mutex_unlock_WRK (hg_intercepts.c:1130)
==624918==    by 0x48473F4: pthread_mutex_unlock (hg_intercepts.c:1151)
==624918==    by 0x4881438: dlsym (dlsym.c:72)
==624918==    by 0x391FE513: __kmp_init_target_mem (in /home/spack/spack/opt/spack/linux-debian11-zen2/clang-13.0.1/llvm-13.0.0-skvgau7fl4xbckqfhnb5wqo3fd55fh2k/lib/libomp.so)
==624918==    by 0x392128C5: __kmp_do_serial_initialize() (in /home/spack/spack/opt/spack/linux-debian11-zen2/clang-13.0.1/llvm-13.0.0-skvgau7fl4xbckqfhnb5wqo3fd55fh2k/lib/libomp.so)
==624918==    by 0x3921C44B: __kmp_serial_initialize (in /home/spack/spack/opt/spack/linux-debian11-zen2/clang-13.0.1/llvm-13.0.0-skvgau7fl4xbckqfhnb5wqo3fd55fh2k/lib/libomp.so)
==624918==    by 0x4A9F292: __run_fork_handlers (register-atfork.c:135)
==624918==    by 0x4AE5547: fork (fork.c:129)
==624918==    by 0x6C7109A: freetensor::Driver::buildAndLoad() (driver.cc:215)
==624918==    by 0x6C728C0: freetensor::Driver::Driver(freetensor::Ref<freetensor::FuncNode> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, freetensor::Ref<freetensor::Device> const&, freetensor::Ref<freetensor::Device> const&, bool) (driver.cc:84)
==624918==    by 0x6D87929: freetensor::Driver::Driver(freetensor::Ref<freetensor::FuncNode> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, freetensor::Ref<freetensor::Device> const&, bool) (driver.h:62)
==624918==    by 0x6D7CF5A: freetensor::Schedule::tuneAutoSchedule(int, int, freetensor::Ref<freetensor::Device> const&, std::vector<freetensor::Ref<freetensor::Array>, std::allocator<freetensor::Ref<freetensor::Array> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, freetensor::Ref<freetensor::Array>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, freetensor::Ref<freetensor::Array> > > > const&, std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> > const&)::{lambda(unsigned long)#1}::operator()(unsigned long) const (allocator.h:95)
==624918==  Lock at 0x402B968 was first observed
==624918==    at 0x48434E5: mutex_lock_WRK (hg_intercepts.c:942)
==624918==    by 0x48473D3: pthread_mutex_lock (hg_intercepts.c:958)
==624918==    by 0x4013895: _dl_open (dl-open.c:786)
==624918==    by 0x4881257: dlopen_doit (dlopen.c:66)
==624918==    by 0x4B5225F: _dl_catch_exception (dl-error-skeleton.c:208)
==624918==    by 0x4B5231E: _dl_catch_error (dl-error-skeleton.c:227)
==624918==    by 0x4881A64: _dlerror_run (dlerror.c:170)
==624918==    by 0x48812E3: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==624918==    by 0x61B610: ??? (in /usr/bin/python3.9)
==624918==    by 0x61A2C9: ??? (in /usr/bin/python3.9)
==624918==    by 0x5298C3: ??? (in /usr/bin/python3.9)
==624918==    by 0x517B9A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==624918==  Address 0x402b968 is 2312 bytes inside data symbol "_rtld_local"

It seems that dlopen was holding a lock, but some function called by fork destroyed the lock. This function was hooked by OpenMP into fork, which was using dlsym.

The above result is test against the OpenMP from LLVM 13. OpenMP from GCC 10.2.1 works fine.