lhmouse/mcfgthread

Deadlock in exit(0) (LLD)

awson opened this issue · 49 comments

awson commented

First thanks for your wonderful library.

Here is a bit of prehistory.

Still, while LLD built with your gcc successfully completes all parallel_fors, it deadlocks at exit(0).

Thanks for your interest. 😂

Would you please provide some more detailed information (backtrace of each thread for example, which can be obtained using a debugger)?
(I have been keeping an eye on LLVM for a couple of years. Recently they seem to have dropped MinGW support because no documentation about MinGW can be found.)

I posted a comment on SF. The first problem didn't seem reproducible.

awson commented

The first problem manifests itself on vanilla MSys2 gcc only.

Fair enough. I am trying to build LLD in MSYS now.

How did you overcome these errors?

CMake Error: File /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project/llvm/lib/Target/PowerPC/MCTargetDesc/LLVMBuild.txt does not exist.
CMake Error at /e/Desktop/build/LLVMBuild.cmake:242 (configure_file):
(... hundreds of errors abridged ...)
awson commented

I never built all targets, usually I build X86 and NVPTX only, but to debug LLD on mingw, X86 only would be enough, thus simply add -DLLVM_TARGETS_TO_BUILD=X86 when doing cmake.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

awson commented

Also paths like /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project... look very suspicious.
I never use anything other than plain vanilla cmd.exe shell to build LLVM.

I followed the instruction here.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

It didn't help. The same errors took place.

awson commented

I suspect the culprit is wrong paths story under MSys shell.
I never had the problems you mentioned when using vanilla command-line.

Usually I build things with simple batch files with (roughly) the following structure:

@echo off
set CXX=g++
set CC=gcc
set PATH=<PATH_TO_GCC_BIN_DIR>;%PATH%
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_INCLUDE_TESTS=OFF -DLLVM_TARGETS_TO_BUILD=X86 -G Ninja LLVM_SRC_PATH
ninja lld

The path problem still exists. But this time it is the Unix-y path that is causing the problem:
(Note that I was using the generator for Unix makefile while you were using the generator for Ninja)

CreateProcess failed: The system cannot find the file specified.

full-log.txt

awson commented

Did you ever tried to use vanilla command prompt (cmd.exe) instead of unixish one?

Yes, neither worked.

Since I am using the MSYS2 cmake, paths are converted to Unix-y ones, so cmake sees the same path in either case:

The C compiler "/mingw32/bin/gcc.exe" is not able to compile a simple test
program.
awson commented

Ah, I use native cmake (mingw or precompiled package from cmake.org), absolutely no problems with it.

There is another error now:

E:\Desktop\build>ninja lld
ninja: error: unknown target 'lld', did you mean 'llc'?

... You just missed -DLLVM_ENABLE_PROJECTS=lld. I added that option and it builds now.

awson commented

Btw, lld on windows doesn't support windows-gnu target linking, thus to reproduce the problem one should either have xxx-windows-msvc object code (that produced by Visual C/C++ compiler or clang with corresponding target) to link. Do you have Visual C/C++ installed?

Perhaps, it would be possible to test lld against elf code linking, but this way one should have elf code to link. I never tried the latter, though.

Yes I have reproduced the problem:

E:\Desktop>cat main.c
#include <stdio.h>

extern void bark();

int main(){
        printf("inside %s\n", __func__);
        bark();
}

E:\Desktop>cat  test.c
#include <stdio.h>

void bark(){
        printf("inside %s\n", __func__);
}

E:\Desktop>cl /nologo /c main.c test.c
main.c
test.c
正在生成代码...

E:\Desktop>build\bin\lld-link.exe main.obj test.obj

Then it hangs.

awson commented

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

That is because ExitProcess() sends all DLLs loaded a DLL_PROCESS_DETACH notification, and the thread that called ExitProcess() hangs up on `NtReleaseKeyedEvent()·.

I attached LLD with a modified version of mcfgthread that would trigger a breakpoint if NtReleaseKeyedEvent() times out in 5 seconds.

Note that inside a DLL, all destructors of static objects are called by DllMain() with the dwReason parameter set to 0 (DLL_PROCESS_DETACH). Here the destructor ThreadPoolExecutor::~ThreadPoolExecutor() called std::condition_variable::notify_all(), which called __MCFCRT_gthread_cond_broadcast(), which thought there were other threads sleeping on the keyed event (the condition variable itself) and called NtReleaseKeyedEvent() and deadlocked.

There should indeed be threads sleeping on the condition variable before exit(), which I believe is exactly why the LLVM people call std::condition_variable::notify_all() inside the destructor. But remember, inside the DllMain() function, all other threads will have been terminated. This is Windows-specific behavior.

Full backtrace:

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 13924.0x3308]
0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
(gdb) bt
#0  0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
#1  0x661023fa in mcfgthread-10!_MCFCRT_Bail () from E:\Desktop\build\bin\mcfgthread-10.dll
#2  0x661071fc in mcfgthread-10!__MCFCRT_OnAssertionFailure () from E:\Desktop\build\bin\mcfgthread-10.dll
#3  0x66103373 in mcfgthread-10!_MCFCRT_BroadcastConditionVariable () from E:\Desktop\build\bin\mcfgthread-10.dll
#4  0x6fee5e2b in  	 (__cond=<optimized out>) at C:/MinGW/MSYS2/mingw32/include/mcfgthread/env/_gthread_inl.h:177
#5  std::condition_variable::notify_all (this=<optimized out>) at ../../../../../gcc/libstdc++-v3/src/c++11/condition_variable.cc:73
#6  0x70e7e793 in lld::internal::ThreadPoolExecutor::~ThreadPoolExecutor (this=0x70ecb640 <lld::internal::getDefaultExecutor()::exec>, __in_chrg=<optimized out>)
    at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:137
#7  0x70e5aed8 in __tcf_0 () at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:173
#8  0x70e411a4 in _CRT_INIT@12 (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=dwReason@entry=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:144
#9  0x70e412b4 in __DllMainCRTStartup (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:211
#10 0x70e413e7 in DllMainCRTStartup@12 (hDllHandle=0x70e40000, dwReason=0, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:171
#11 0x77969364 in ntdll!RtlQueryEnvironmentVariable () from C:\Windows\SysWOW64\ntdll.dll
#12 0x70e40000 in ?? ()
#13 0x77988f68 in ntdll!LdrShutdownProcess () from C:\Windows\SysWOW64\ntdll.dll
#14 0x70e413c4 in __DllMainCRTStartup (hDllHandle=0x148ed18, dwReason=2382904, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:201
#15 0x00000000 in ?? ()
(gdb)

Briefly, the following happened:

  1. A thread slept on a condition variable. The 'threads_trapped' counter of the condition variable was incremented.
  2. Another thread called ExitProcess() [Update: I incorrectly referred ExitThread() here], terminating the previous thread while it was sleeping.
  3. The second thread broadcast the condition variable. It read and cleared the 'threads_trapped' counter. Because the counter was non-zero, it called NtReleaseKeyedEvent() to wake up those threads sleeping. Note that NtReleaseKeyedEvent() blocks the caller until a thread is woken up. Since there were no threads to wake up, it effectively resulted in a deadlock.

Possible workaround: If there are no other threads, don't call NtReleaseKeyedEvent() in ReallySignalConditionVariable(), ReallySignalMutex() or RealSetAndSignalOnceFlag().

... But how can we tell there are no other threads?

awson commented

Another thread called ExitThread(), terminating the previous thread while it was sleeping

Don't quite understand this. Why the "previous" thread is terminated? Because it was created by "another" thread or what?

See text following 'When handling DLL_PROCESS_DETACH' in https://msdn.microsoft.com/en-us/library/windows/desktop/ms682583(v=vs.85).aspx .

Oops it was a typo. It was ExitProcess() that the second thread called. I fixed up the comment.

awson commented
  1. This is a problem only when threads are brutally killed by ExitProcess, otherwise we won't get in this situation;
  2. When the process exits anyway, we don't need to bother with releasing resources explicitly;

The let's add a flag indicating that we are just about to exit and add an exit handler (using atexit) with sets this flag on. Now we may call NtReleaseKeyedEvent only if the flag is off.

This won't prevent the deadlock from using ExitProcess directly (which is non-preventable in general AFAIUI), but at least should cure exit or return from main.

Thoughts?

Not a solution.

  1. The mingw-w64 project has their own atexit() implementation, which is module(exe + dll)-wide and not process-wide. Callbacks of a DLL are also called as a result of FreeLibrary().
  2. Handlers registered by atexit() are called in FILO order. We have to register one after the ctor of the thread pool returns, which isn't practically possible without modifying LLVM code.

The FUTEX API does not have such bugs AFAICS but they ain't available on Windows 7. Pity.

Anyway, I personally think it is undefined behavior to call std::exit() while more than one thread exist and it is a bug in LLD code. LLD should call std::quick_exit() instead.

awson commented

This isn't a problem of std::exit(). I've looked into why they introduced it in the first place. It turned out they merely shaved off 2s by not running destructors when linking huge chromium exe.

The program deadlocks also when returns normally. Try to comment exit out. This won't help.

awson commented

I've looked into mingw-w64 code. Indeed, they introduced their own atexit, but interestingly, they don't call ExitProcess directly (as Microsoft does), but call standard exit (from msvcrt), thus using msvcrt's atexit it's possible to set exit handlers for it (didn't think about how to accomplish this, though).

Also, I don't quite understand why we have to register one after the ctor of the thread pool returns.

According to the C++ standard calling std::exit() is equivalent to returning from main() except that automatic objects are not destroyed. In C both are totally equivalent.

I suggest you ask LLD people for their opinion about it. If they are willing to fix their code I will close this issue as WONTFIX. Meanwhile, I am also searching for solutions.

Also, I don't quite understand why we have to register one after the ctor of the thread pool returns.

It is simple: If we register the magic callback before the ctor, it will only get called after the dtor that is causing the deadlock (virtually not called at all).

awson commented

Removing exit doesn't help. LLD deadlocks. Otherwise I wouldn't even bother to open this ticket.

From your explanations I've understood that the deadlock can happen only if sleeping threads are killed by ExitProcess. But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

Please double check this for sure. In the case of DLL such handlers are run from the entry point function (__DllMainCRTStartup() above) which is a result of ExitProcess().

awson commented

I even tried to put my own exit handler into an LLD code, which simply printed something to the console, and it completed successfully before the deadlock. Thus we have no problems with either mingw-w64 or FILO order.

Actually GCC uses atexit() to register dtors of static objects. If you register a callback after the pool's ctor returns, your callback is called before its dtor. If you register a callback before its ctor returns (including inside its ctor body), your callback is called after its dtor.

Again, it is not possible to register a callback that can be called before the pool's dtor without modifying LLD' source, as it requires registering it after the pool's ctor (for example, inside the main() function.)

awson commented

I am not ready to investigate this right now, but this looks extremely strange.

How could atexit semantics depends on the place where it is called? Only handlers execution order depends on it but all of them should be guaranteed to be executed before the exit proper.

How could atexit semantics depends on the place where it is called?

The C standard doesn't care about dynamic libraries, but it is true on almost every platform where there are dynamic libraries. The reason is simple, too: If you load a dynamic library which registers some exit callbacks and then unload it, those callbacks must be run and removed before the virtual memory of the dynamic library is unmapped. If those callbacks were deferred to the time when you realy call atexit() (directly or by returning from main()), the dynamic library might have been unmapped and you would likely get a segment fault.

Please test the new branch. If this is OK I will port it to both branches.

Thank goodness I found the function RtlDllShutdownInProgress() and it saved us!
It was just an accident, wasn't it? 😂

awson commented

Looks like it does the trick for me! Thanks!

Merged to master and cherry-picked to release-1.x then.

Thanks for sharing you solution. @lhmouse
Is this page described the same problem?
https://support.microsoft.com/zh-cn/help/2582203/a-process-that-is-being-terminated-stops-responding-in-windows-7-or-in

I think it's fixed as a bug, but I can't expect every PC is patched.

And is there any page describes "RtlDllShutdownInProgress" ? It's hard to find discussion about this function

Since I was just reading lld code, if you want to learn more about lld+exit https://reviews.llvm.org/D58246

@wwc7654321 No. If a thread calls exit() or ExitProcess() without waiting for other threads to terminate then it must not call any synchronization function, nor should it perform memory allocation or deallocation, as the process state is not predictable.

@MaskRay std::thread::detach() is nonsense for a hosted program. Don't use it, period.

If you want to see yet more related to LLD exit (using MSVC static run-time), check out my experimental patch here: https://reviews.llvm.org/D40366. I agree that detached threads are not good.

Specifically on Windows, if threads are detached then it is not safe to call std::exit() or std::quick_exit(), or to return from main() which has the same effect as calling std::exit(). The program must call std::_Exit().