sserver: accidental deadlock when release StratumSession from consume thread
Closed this issue · 5 comments
This happened to two different servers in two days. The phenomenon is that the TCP connection can be established, but the server will not make any response to your request.
It seems like libevent: Deadlock when calling bufferevent_free from an other thread
I'm experiencing a deadlock on libevent-2.0.19 while calling bufferevent_free frome thread A, while thread B is in event_base_dispatch.
And a reply:
Ouch. This is a known bug. This fix is going to be hard. I wrote
about it here:
http://archives.seul.org/libevent/users/Feb-2012/msg00053.html
GDB infomations:
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff2055b3780 (LWP 19479))]
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007ff2034b8e42 in __GI___pthread_mutex_lock (mutex=0xf73fc0) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007ff2038e6fac in _bufferevent_incref_and_lock (bufev=bufev@entry=0x124a910) at bufferevent.c:582
#3 0x00007ff2038e805a in bufferevent_writecb (fd=44, event=<optimized out>, arg=0x124a910) at bufferevent_sock.c:212
#4 0x00007ff2038dc4c9 in event_process_active_single_queue (activeq=0xe817d0, base=0xd64f20) at event.c:1350
#5 event_process_active (base=<optimized out>) at event.c:1420
#6 event_base_loop (base=0xd64f20, flags=flags@entry=0) at event.c:1621
#7 0x00007ff2038dd5f7 in event_base_dispatch (event_base=<optimized out>) at event.c:1450
#8 0x000000000043919e in Server::run (this=<optimized out>) at /work/btcpool/src/StratumServer.cc:1021
#9 StratumServer::run (this=<optimized out>) at /work/btcpool/src/StratumServer.cc:792
#10 0x0000000000434557 in main (argc=<optimized out>, argv=<optimized out>) at /work/btcpool/src/sserver/StratumServerMain.cc:173
(gdb) thread 7
[Switching to thread 7 (Thread 0x7ff1f96bf700 (LWP 19485))]
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007ff2036ccfa5 in evthread_posix_cond_wait (_cond=0xd5f5c0, _lock=0xd62310, tv=<optimized out>) at evthread_pthread.c:156
#2 0x00007ff2038da58d in event_del_internal (ev=<optimized out>) at event.c:2220
#3 event_del (ev=ev@entry=0x124a9a8) at event.c:2188
#4 0x00007ff2038e85c7 in be_socket_destruct (bufev=0x124a910) at bufferevent_sock.c:592
#5 0x00007ff2038e7472 in _bufferevent_decref_and_unlock (bufev=0x124a910) at bufferevent.c:622
#6 0x00007ff2038e79cb in bufferevent_free (bufev=0x124a910) at bufferevent.c:681
#7 0x0000000000448627 in StratumSession::~StratumSession (this=0x1155640, __in_chrg=<optimized out>) at /work/btcpool/src/StratumSession.cc:261
#8 0x00000000004398b2 in Server::sendMiningNotifyToAll (this=0xd25258, exJobPtr=std::shared_ptr (count 4, weak 0) 0x7ff1e804f240) at /work/btcpool/src/StratumServer.cc:1054
#9 0x0000000000439a89 in JobRepository::sendMiningNotify (this=this@entry=0xd28a40, exJob=std::shared_ptr (count 4, weak 0) 0x7ff1e804f240) at /work/btcpool/src/StratumServer.cc:316
#10 0x000000000043c2bc in JobRepository::consumeStratumJob (this=this@entry=0xd28a40, rkmessage=rkmessage@entry=0x7ff1f0001df0) at /work/btcpool/src/StratumServer.cc:271
#11 0x000000000043c500 in JobRepository::runThreadConsume (this=0xd28a40) at /work/btcpool/src/StratumServer.cc:176
#12 0x00007ff2031e5c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007ff2034b66ba in start_thread (arg=0x7ff1f96bf700) at pthread_create.c:333
#14 0x00007ff20294b3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:74
#15 0x0000000000000000 in ?? ()
It looks like thread 7
want to call bufferevent_free()
for a session when thread 1
is writing to it.
I now want to know if the new version of libevent has solved this problem. Otherwise, we need to change the code to prevent this problem.
The libevent I used:
ii libevent-2.0-5:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library
ii libevent-core-2.0-5:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (core)
ii libevent-dbg:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (debug symbols)
ii libevent-dev 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (development files)
ii libevent-extra-2.0-5:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (extra)
ii libevent-openssl-2.0-5:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (openssl)
ii libevent-pthreads-2.0-5:amd64 2.0.21-stable-2ubuntu0.16.04.1 amd64 Asynchronous event notification library (pthreads)
libevent/libevent#512 (comment)
azat
commented on 24 May 2017Indeed it looks like issues mentioned by
ploxiln
Can you please verify your program with latest libevent (compiled from git, since that patches was merged only recently and there was not release since that time)?
A potential fix is to build libevent
from its master branch:
git clone https://github.com/libevent/libevent.git
cd libevent
./autogen.sh
./configure --disable-shared
make && make install
Then remove your libevent-dev
package and cmake ..
again. (It may fail and need to clean up the cmake cache.)
No deadlock problems have been encountered since our build switched to the master branch of libevent.
Notice: the lastest release of libevent release-2.1.8-stable
has been confirmed to have this problem. Do not use it and previous releases.
I am very sorry that libevent has no new release in nearly two years.
The issue was temporarily closed due to no recent feedback.
libevent 2.1.9-beta was released. It's recommended to use this version.
https://github.com/libevent/libevent/releases/tag/release-2.1.9-beta
@YihaoPeng
Hi mate
Do you have any ideas how to fix if?