Deadlock in QueryBehaviour
Closed this issue · 0 comments
iand commented
This goroutine is holding the QueryBehaviour
lock, trying to Notify a Waiter
that a EventGetCloserNodesSuccess
was received.
The are no goroutines selecting on the waiter's channel. I would expect it to be in Coordinator.waitForQuery
called from Coordinator.QueryMessage
.
goroutine 6818 [select, 33 minutes]:
github.com/plprobelab/zikade/internal/coord.(*Waiter[...]).Notify(0xc0018abbe0, {0x2eda480?, 0xc0021febd0}, {0x2ec3ca0, 0xc0034ab2d0})
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:126 +0x105
github.com/plprobelab/zikade/internal/coord.(*PooledQueryBehaviour).Notify(0xc00014df80, {0x2eda480?, 0xc00216cf00?}, {0x2ec3be0?, 0xc00253a000?})
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/query.go:189 +0x109f
github.com/plprobelab/zikade/internal/coord.(*NodeHandler).send(0xc00079d880, {0x2eda480, 0xc00216cf00}, {0x2ecf4d0?, 0xc00262e460?})
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/network.go:165 +0x33e
github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1.1()
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:81 +0x108
created by github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:75 +0x7a
There are 8 goroutines are waiting on the lock at this point:
goroutine 6990 [sync.Mutex.Lock, 33 minutes]:
sync.runtime_SemacquireMutex(0x11eeaff?, 0x80?, 0xc001f4a390?)
/home/iand/sdk/go1.20.5/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc00014dfd8)
/home/iand/sdk/go1.20.5/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
/home/iand/sdk/go1.20.5/src/sync/mutex.go:90
github.com/plprobelab/zikade/internal/coord.(*PooledQueryBehaviour).Notify(0xc00014df80, {0x2eda480?, 0xc001f4a390?}, {0x2ec3d40?, 0xc0015c87d0?})
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/query.go:152 +0x125
github.com/plprobelab/zikade/internal/coord.(*NodeHandler).send(0xc001293c00, {0x2eda480, 0xc001f4a390}, {0x2ecf4f8?, 0xc001293600?})
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/network.go:186 +0x63b
github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1.1()
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:81 +0x108
created by github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1
/home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:75 +0x7a
Somehow we have lost the select that should be reading from the waiter's channel.