bitswap: long-running session leaks
Opened this issue · 4 comments
We've been running long-running Bitswap sessions and encountered a series of memory leaks that we've successfully identified and fixed in our fork. I plan to start upstreaming them eventually, but I am currently botted with millions of things. Anyway, I want to make you aware of their existence already in case you have the capacity to look into those.
The full branch is located here. Some of them were merged, some of them aren't. The most notable are:
- fix leak ensuring SessionInterestManager is cleaned up
- fix leak in block presence manager along with
- cleanup toFetch cidqueue
- cleanup peerresponsetracker
- rebroadcast only when there are live CIDs - not a leak, resolves issue with timer continue ticking even when there are not CIDs to rebroadcast further
All of those fixes have been tested on hundreds of nodes in production, and things have been running smoothly and steadily for 3 weeks. I am happy to answer if you start working on this sooner than when I get to opening PRs myself.
cc @hsanjuan, @gammazero, @lidel
I will create separate PR for each of these and review and merge these into our code base, unless you would rather do this from your fork. We are also resolving memory consumption issues with connections, so it would be great to get all the known memory issues handled in at the same time. So let me know soon.
Thank you.
The first two fixes appear to be incorrect according to the comment in SessionInterestManager
:
// Note that once the block is received the session no longer wants
// the block, but still wants to receive messages from peers who have
// the block as they may have other blocks the session is interested in.
@Wondertan Please let me know if this needs further investigation.
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.