filecoin-project/lassie

Retrieval deadlock under load in @magik6k's RIBS retrieval

Closed this issue · 3 comments

@magik6k is having problems where retrieval is locking up in his usage of Lassie within RIBS.

He's finding that retrievals lockup after a few blocks. When he turns up ConcurrentSpRetrievals to 100, it goes for longer but locks up again after 100 or so blocks. His go routine dump, attached below, suggests a lockup in two places in parallelpeerretriever.go, one the call to PriorityWaitQueue.wait and the other in retrievalShared.sendEvent. This could be related to #343. Either way, I suspect it's causing concurrentspretrievals to get hit.
magikdump.txt

rvagg commented

closing due to no further reports of problems and no additional information to guide further investigation