kadena-community/kadena.js

[Chainweb-Stream-Client] While reconnecting: allow duplicates or guarrantee only-once?

Takadenoshi opened this issue · 4 comments

This issue is about deciding how to handle an edge case described in a code comment here.

When chainweb-stream-client reconnects, it uses the minHeight parameter to resume from where it left off.

The chains can have a max height difference of up to 3 at any given point. In a a scenario where the leading chain(s) are +3 from the trailing chains, if we have tracked the lastHeight / minHeight of the leading chain, and then we get disconnected and reconnect, we can not use that value as minHeight or we would potentially miss transaction confirmations between [minHeight-3 .. minHeight - 1] from trailing chains. (or even entire transactions if the client reconnects after their update window has been broadcast completely, i.e. we miss the entire confirmation "process" from 0 until $CONFIRMATION_DEPTH)

As such we need to resume from lastHeight - 3, but then it will be emitting duplicate unconfirmed and confirmed events after reconnecting.

For unconfirmed events this is in line with the usage pattern, as the same transaction may be emitted as unconfirmed multiple times with different confirmation depth values, but in the confirmed event case, a consumer would expect that each transaction will fire a confirmed event only once.

If we want to provide an "only-once" guarantee for confirmed events, we will need to track emitted confirmed IDs for the last few heights and de-duplicate when reconnecting.

Is this worth the effort and overhead in the client? Or should we document that this edge case and let consumers handle it?

At the moment, the lastHeight parameter tracked on the client is the max height of all seen transactions.
When confirmations roll in we do not increase this, and neither do we broadcast the current cut from the server

Should change this, or even better, we stream the current cut as seen from the server? Or at least the maxHeight of the cut across all chains.

Pro: otherwise every reconnection would resend events, since we would query ?minheight=(client.lastHeight - 3) and we would at least get the last event back even if we actually confirmed it client-side

Pro: would provide even more data about the state of cw-stream-server and its backing cw-data. a client would be able to determine that the connection to cw-stream-server is active but something in the upstream
backend is not tracking the chain correctly.

Pro: feels more correct

Con: wire would be slightly chattier.

Con: (server-side) streaming cut from cw-data is not available, polling would be needed.

On the server-side, this would unfortunately not be as simple as using the sse endpoint of chainweb-node (which would otherwise be ideal), because our source of truth is chainweb-data which will normally lag slightly behind, so we would have to do a tight poll loop to cw-data to keep track of its tracked state.

Note: If we do do this and the server treament of minheight remains unchanged, we would have to resume with (?minheight=(lastHeight - confirmationDepth - maxChainHeightDiff)

we would at least get the last event back even if we actually confirmed it client-side

This alone would be a common enough occurrence to make addressing this issue worth it. The time elapsed since last event would not change this outcome; a client receiving a duplicate confirmed event minutes or hours after the actual confirmation would not be acceptable.

We will address this by adding a new event type and de-duplication code on the client:

Should change this, or even better, we stream the current cut as seen from the server? Or at least the maxHeight of the cut across all chains.

We will add a new event type max-height from the server that broadcasts the max seen height. The relevant height here is the highest seen on chainweb-data across all 20 chains.

The client will base the min-height re-connection parameter on the aforementioned value, sending min-height=$max_height - $server_confirmation_depth - 3

We will also include a de-duplicating buffer on the client side that filter on {id, confirmation_depth} of all received data events. These will be stored over a sliding window* of [$max-height, $max-height - $confirmation_depth - 3].

*Discarding data outside of this sliding window will reduce the memory overhead. We only need to keep track of these "hot" items and not everything we have received.

We will add a new event type max-height from the server that broadcasts the max seen height. The relevant height here is the highest seen on chainweb-data across all 20 chains.

Note: Draft implementation on server is events which sends all 20 chains' heights, as it may be useful to users beyond this specific use case.

This issue is stale because it is open for 60 days with no activity