baidu/braft

lock issue

CkTD opened this issue · 1 comments

CkTD commented

int Replicator::_continue_sending(void* arg, int error_code) {
Replicator* r = NULL;
bthread_id_t id = { (uint64_t)arg };
if (bthread_id_lock(id, (void**)&r) != 0) {
return -1;
}
if (error_code == ETIMEDOUT) {
// Replication is in progress when block timeout, no need to start again
// this case can happen when
// 1. pipeline is enabled and
// 2. disable readonly mode triggers another replication
if (r->_wait_id != 0) {
return 0;
}
// Send empty entries after block timeout to check the correct
// _next_index otherwise the replictor is likely waits in
// _wait_more_entries and no further logs would be replicated even if the
// last_index of this followers is less than |next_index - 1|
r->_send_empty_entries(false);
} else if (error_code != ESTOP && !r->_is_waiter_canceled) {
// id is unlock in _send_entries
r->_wait_id = 0;
r->_send_entries();
} else if (r->_is_waiter_canceled) {
// The replicator is checking current next index by sending empty entries or
// install snapshot now. Although the registered waiter will be canceled
// before the operations, there is still a little chance that LogManger already
// waked up the waiter, and _continue_sending is waiting to execute.
BRAFT_VLOG << "Group " << r->_options.group_id
<< " Replicator=" << id << " canceled waiter";
bthread_id_unlock(id);
} else {
LOG(WARNING) << "Group " << r->_options.group_id
<< " Replicator=" << id << " stops sending entries";
bthread_id_unlock(id);
}
return 0;
}

block timer 中,723行 这个分枝返回之前没有 unlock id,如果走到了似乎会把 replicator 卡死?

目测这里 fix 了:#430