ipfs/boxo

bitswap/server: `WANT_HAVE` requests should use `Blockstore.Has`

Wondertan opened this issue ยท 12 comments

Currently, WANT_HAVE requests rely on GetSize to check for presence of the requested block and the learnt size gets discarded.

Basically, the Blockstore has to lookup data size that's unused. Depending on Blockstore implementation, this might have different consequences. Intuitively, looking up data size should be more expensive compared to checking data existence. In practice, this is what happens in our case.

Hence the proposal is to add new method to blockstore manager that performs only the Has check and refactor ReceivedMessage to decouple processing of WANT_HAVE and WANT_BLOCK messages.

This is relatively urgent for us and if there is no capacity from your side to implement this, we could submit a patch as well

@Wondertan Preliminary investigation does not indicate that there is any specific reason that GetSize was used over Has so I will see if it looks reasonable to get this done now.

Triage notes:

  • Blockstore.Has sounds very sensible (presence check often cheaper/billed differently than reading size)
  • we were looking into necessity of getting sizes, and plan to make maxBlockSizeReplaceHasWithBlock configurable, @gammazero will look into this
  • we also want to investigate block duplication related to this and other bitswap ipfs/rainbow#159

I would also take a look at how cached blockstores(like bloom) would behave with this change.

It is true that Want_Have requests rely on GetSize to check for presence of the requested block and the size gets discarded. So, using Blockstore.Has for Want_Have requests does offer an slightly faster lookup, however...

it also risks having to do both the Has and GetSize lookup for the same CID since it is possible for there to be both Want_Have and Want_Block requests for the same CID. This is why all the CID's from the wantlist are put into a unique set and then the sizes are looked up for al of them. This way there is only one query per CID, just they are all size lookups.

I think there are other improvements that can probably have a more significant performance improvement. Specifically the use of worker pools and communication between these goroutines can be streamlined. Additionally, high frequency metrics updates may also contribute non-trivial overhead. Improvements will be supplied as more profiling is done.

it also risks having to do both the Has and GetSize lookup for the same CID since it is possible for there to be both Want_Have and Want_Block requests for the same CID. 

@gammazero, it possible for both requests to come for the same CID from a single peer? Or do you mean they are deduplicated across multiple peers?

Gentle ping @gammazero.

We would like to understand the reasoning behind the risk a bit more.

Last ping @gammazero

@Wondertan This is only across different peers, AFAIK. Any single peer should only issue a WantHave or a WantBlock for a CID.

Some things to consider...

A bitswap message may have a number of CID entries. Some are WantHave and some are WantBlock. Some WantHaves may be converted to WantBlock if the blocks are small enough

Using blockstore.Has for WantHave requests is only doable when automatically converting small WantHave requests to WantBlock is disabled (when e.maxBlockSizeReplaceHasWithBlock is zero). Getting block sizes is necessary to see if a block is small enough to do the conversion.

Getting block sizes is done as a concurrent batch, which is generally faster for more than a very small number of blocks, although it may also depend on backend store, cached values, etc. This means that we will want to separate the WantHaves from the WantBlocks in a bitswap message, so that those can both be done in concurrent batches.

At a later time the peer client may decide to send a message to retrieve the blocks for which WantHave responses were received. Now the bitswap server needs to get the block sizes for all those blocks, because it did not get the sizes previously. This is where the "extra" query happens.

I coded a prototype corresponding to the above description, and did not see a significant performance improvement. However, I may not have been testing at sufficient scale and/or was using a similar storage backend to what you are using.

Given specific storage backends, it may make a very significant difference in performance and/or cost to avoid reading block sizes. I propose two changes:

  1. Make maxBlockSizeReplaceHasWithBlock configurable. Setting to zero would disable automatic conversion of WantHave to WantBlock for small blocks.
  2. Offer the above code changes as a PR for you to test-drive and see if it meets your needs.

@Wondertan Here are the prototype and kubo configuration:

@gammazero, that's a great solution and fixes our concern! Thank you โค๏ธ

Should we start testing the prototype now or after your PR gets ready for review?

@Wondertan You can start testing it now, and any feedback will be helpful. We would like to review and test in our cluster soon, but have not yet so cannot make any guarantees about how it works.