Retrieval deadlock under load in @magik6k's RIBS retrieval
Closed this issue · 3 comments
@magik6k is having problems where retrieval is locking up in his usage of Lassie within RIBS.
He's finding that retrievals lockup after a few blocks. When he turns up ConcurrentSpRetrievals to 100, it goes for longer but locks up again after 100 or so blocks. His go routine dump, attached below, suggests a lockup in two places in parallelpeerretriever.go, one the call to PriorityWaitQueue.wait and the other in retrievalShared.sendEvent. This could be related to #343. Either way, I suspect it's causing concurrentspretrievals to get hit.
magikdump.txt
Slack discussion: https://filecoinproject.slack.com/archives/CP50PPW2X/p1688392651130039
Likely fix: ipfs/go-graphsync#428
closing due to no further reports of problems and no additional information to guide further investigation