Limit bitswap wants by server and/or client backpressure
gammazero opened this issue · 3 comments
This issue captures information from a PR discussion in case it is useful for future Bitswap work. If this no longer applies, then this issue can be closed.
It would be great if Bitswap clients could avoid sending more than a protocol-wide constant by using back-pressure to limit limit the number of wants.
From @aschmahmann:
Here are 3 options, but these might not be correct:
- Having the server backpressure the client on the other side of the wire
- Similar to my comment above, and AFAICT requires some protocol changes to accommodate
- Have the bitswap client internally batch
GetBlocks
calls that are for larger thanmaxWantlistSize
into batches before returning them in case the batch is fully returned before the rebroadcast interval?- Doable if it'll be helpful, although it's a bit gross since A)
maxWantlistSize
shouldn't really be a protocol/network-wide thing and be per-client B) this batching can be done outside of the bitswap client package.
- Doable if it'll be helpful, although it's a bit gross since A)
- The bitswap client backpressuring the caller (e.g. code that's walking a DAG)
- My suspicion is this would best be served by having a streaming version of
GetBlocks
that you could block on which is a separate problem that seems like #121.
- My suspicion is this would best be served by having a streaming version of
From @Wondertan:
My suggestion was some form of 2 and 3. The client should be able to deal with the server-side request rate-limiting; otherwise, the client gets stuck expecting to be served, while the server simply cuts off his wants. The rebroadcasting after one minute helps, but that's still one minute and request spamming overhead over the wire.
Ideally, we would do a protocol change as described in 1, but as it's breaking, we may consider other less clean options, like or similar to 2. Setting protocol-wide maxWantlistSize
is gross, I agree. Another option might be negotiating the limit between the client and the server so the client knows it should never exceed it.
The 3 is complimentary and provides a new, powerful way to interface with a client. However, I just realized that it is not necessary for our case if the client is smart enough. In our case, we have a flat structure, i.e. we don't traverse a DAG where we unpack IPLD nodes to get more CID links to fetch them and unpack again to get to the data. Essentially, we know all the CIDs in advance and could simply ask Bitswap to get all of them over GetBlocks as long the client is smart enough not to get into any limitations of its immediate peers, which we are currently facing with this issue.
From @gammazero:
With the latest changes in PR #629, the main problem should be solved. The problem was that wants that have DONT_HAVE
responses remained on the queue, and caused new incoming messages to be dropped when there is no more free space on the queue. The same failure also prevented new wants from updating existing wants. This is fixed: Incoming wants are now applied to the queue, and any new ones that cannot be handled without increasing the queue over the size limit are handled specially as overflow, where their priority is considered for replacing existing wants.
Having the client limit its message sizes may be useful to avoid oversized wantlists from being truncated and allowing any overflow is handled. The client can group batches of wants by priority if applicable. In the future, priority can be influenced by DAG path length (higher priority for items closer to root).
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
- "Priority" labels will show how urgent this is for the team.
- "Status" labels will show if this is ready to be worked on, blocked, or in progress.
- "Need" labels will indicate if additional input or analysis is required.
Finally, remember to use https://discuss.ipfs.io if you just need general support.
@gammazero, i believe your change is useful and should help with the problem, but no completely solve it.
In case server side has all the data client wants, but client wants more than queue size, the wants are gonna be dropped.
This might not be the biggest problem for IPFS due its Dag structure, but is a problem for other use cases like ours. We have a "flat" structure and may request 256*256 items "at once", hitting this issue.
Ideally, we would do some form of backpressure as @aschmahmann suggested, but that's gonna be a large protocol change and we want assess other options before pushing such a change.
@Wondertan Yes, this is unfortunately the case that when client wants more than queue size the wants are dropped. The client must batch requests to be within the queue limit. This is why we need backpressure or flow-control, and why this issues is here. As you said, this will likely be a significant protocol change.