baidu/braft

NodeImpl::_mutex死锁

amoxic opened this issue · 2 comments

braft 版本

commit id:3cae30fb67cb9e988650500522c6d64ae609f2aa

现象

braft内部线程都阻塞在对同一个NodeImpl::_mutex(地址 0x38a4b48) 加锁操作上,通过gdb查看锁的持有者,发现死锁了。
KxXRgJ5KjP

补充信息:

  1. 业务有线程分别阻塞在 Node::is_leader_lease_valid ,Node::apply ,Node::get_status调用,内部也是在等锁
  2. 主要堆栈

Thread 54 (Thread 0x7f5145ffb640 (LWP 67) "worker-0"):
#0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6
#1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x00000000011d3d50 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69
#3 std::unique_lockbutil::Mutex::lock (this=0x7f5145ff5560) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139
#4 std::unique_lockbutil::Mutex::unique_lock (__m=..., this=0x7f5145ff5560) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69
#5 braft::NodeImpl::apply (this=0x38a47f0, tasks=0x7f5145ff5610, size=1) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:1991
#6 0x00000000011d4416 in braft::NodeImpl::execute_applying_tasks (meta=0x38a47f0, iter=...) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:668
#7 0x0000000001260d9d in bthread::ExecutionQueueBase::_execute(bthread::TaskNode*, bool, int*) ()
#8 0x000000000126311a in bthread::ExecutionQueueBase::start_execute(bthread::TaskNode*) ()
#9 0x00000000011cfc81 in bthread::ExecutionQueuebraft::NodeImpl::LogEntryAndClosure::execute (handle=0x0, options=0x230c082 bthread::TASK_OPTIONS_INPLACE, task=, this=) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/bthread/execution_queue_inl.h:338
#10 0x00000000011736a9 in braft::Node::apply (this=, task=...) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/raft.cpp:182

Thread 23 (Thread 0x7f51c77fe640 (LWP 36) "brpc_worker:11"):
#0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6
#1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x00000000011ded08 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69
#3 std::unique_lockbutil::Mutex::lock (this=0x7f5052aeac70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139
#4 std::unique_lockbutil::Mutex::unique_lock (__m=..., this=0x7f5052aeac70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69
#5 braft::NodeImpl::handle_append_entries_request (this=0x38a47f0, cntl=cntl@entry=0x7f517c51f780, request=request@entry=0x7f516c068080, response=response@entry=0x7f516c133bb0, done=done@entry=0x7f50700c5740, from_append_entries_cache=from_append_entries_cache@entry=false) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:2357
#6 0x00000000011efaa3 in braft::RaftServiceImpl::append_entries (this=, cntl_base=0x7f517c51f780, request=0x7f516c068080, response=0x7f516c133bb0, done=0x7f50700c5740) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/brpc/closure_guard.h:55
#7 0x00000000012bb155 in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) ()
#8 0x00000000012b0557 in brpc::ProcessInputMessage(void*) ()
#9 0x00000000012755af in bthread::TaskGroup::task_runner(long) ()
#10 0x0000000001406e31 in bthread_make_fcontext ()
#11 0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7f51c7fff640 (LWP 35) "brpc_worker:10"):
#0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6
#1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x00000000011ded08 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69
#3 std::unique_lockbutil::Mutex::lock (this=0x7f50502dec70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139
#4 std::unique_lockbutil::Mutex::unique_lock (__m=..., this=0x7f50502dec70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69
#5 braft::NodeImpl::handle_append_entries_request (this=0x38a47f0, cntl=cntl@entry=0x7f51b81da110, request=request@entry=0x7f51ec067c80, response=response@entry=0x7f51b81464c0, done=done@entry=0x7f50641b5c10, from_append_entries_cache=from_append_entries_cache@entry=false) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:2357
#6 0x00000000011efaa3 in braft::RaftServiceImpl::append_entries (this=, cntl_base=0x7f51b81da110, request=0x7f51ec067c80, response=0x7f51b81464c0, done=0x7f50641b5c10) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/brpc/closure_guard.h:55
#7 0x00000000012bb155 in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) ()
#8 0x00000000012b0557 in brpc::ProcessInputMessage(void*) ()
#9 0x00000000012755af in bthread::TaskGroup::task_runner(long) ()
#10 0x0000000001406e31 in bthread_make_fcontext ()
#11 0x0000000000000000 in ?? ()

@amoxic
image
请教下哪来的 1.1.3 版本啊?master 代码么?

@amoxic image 请教下哪来的 1.1.3 版本啊?master 代码么?

@ergesun 不好意思,是我们自己打的tag,commit id是 3cae30f