Understanding the data structure for Sync mechanism

Question

Understanding the data structure for Sync mechanism

Opened this issue 7 months ago · 3 comments

This document provides insight into the type of data structure a Sync mechanism should have. Introduction to Sync store can be found here #62

The Sync mechanism should use messageHash as the central entity so a node can uniquely understand the subjected Waku message. So we aim to synchronize a bunch of messageHashes let's say a list of them [msgHash1,msgHash2,msgHash3,...]. There is no need to have a key-value store where the messageHash is the key and Waku message is the value, because to uniquely sync messageHash itself is sufficient.

Since there is no deterministic ordering in that case we follow some sort of ordering approach. Serialized based on:

a) Waku message timestamp
b) in case of conflict then chronological order based on messageHash i.e. 0xaaaa -> 0xabbbb.

The ordering will help in any of the following solutions we aim to go forward with, such as:

Using some sort of Prolly trees into store #71
Simply swapping the key-value store itself to a synchronised backend, such as GossipLog #68
Spin out some in-house technique to sync entries #

Answer 1 · 2023-11-20T21:10:31.000Z

The Sync mechanism should use messageHash as the central entity so a node can uniquely understand the subjected Waku message.

💯

There is no need to have a key-value store where the messageHash is the key and Waku message is the value, because to uniquely sync messageHash itself is sufficient.

How so? We could assume that all message have valid RLN proof to save some bits but we need the payload at some point no?

Since there is no deterministic ordering in that case we follow some sort of ordering approach. Serialized based on:

a) Waku message timestamp b) in case of conflict then chronological order based on messageHash i.e. 0xaaaa -> 0xabbbb.

The ordering of the messages should be b to enable binary search. The indexes would allow efficient search by other field (time, topics, payload?).

The ordering will help in any of the following solutions we aim to go forward with, such as:

Using some sort of Prolly trees into store #

Simply swapping the key-value store itself to a synchronised backend, such as GossipLog #

Spin out some in-house technique to sync entries #

All those are very similar.
GossipLog use Prolly trees under the hood and the only way I know to do 3. would be to use Prolly trees or MSTs.

Answer 2 · 2023-11-21T21:36:22.000Z

How so? We could assume that all message have valid RLN proof to save some bits but we need the payload at some point no?

Yeah, so basically the aim here is to sync the messageHashes since using only that info we can do later operation/some-sort-of--filtering using actual message which a messageHash represents.

Since there is no deterministic ordering in that case we follow some sort of ordering approach. Serialized based on:
a) Waku message timestamp b) in case of conflict then chronological order based on messageHash i.e. 0xaaaa -> 0xabbbb.

The ordering of the messages should be b to enable binary search. The indexes would allow efficient search by other field (time, topics, payload?).

if we do the ordering of messageHashes then we can makes a Prolly tree or sync mechanism, indexes will help later once we realize which exact messages are to be transported (by querying them from DB/indexes).

Answer 3 · 2023-11-21T22:26:15.000Z

Yeah, so basically the aim here is to sync the messageHashes since using only that info we can do later operation/some-sort-of--filtering using actual message which a messageHash represents.

I misunderstood, the Sync mechanism would only use hashes right?

if we do the ordering of messageHashes then we can makes a Prolly tree or sync mechanism, indexes will help later once we realize which exact messages are to be transported (by querying them from DB/indexes).

What I envision;

KV store for #s and messages.
One Prolly tree based index per message field we need to search.
Sync can be done with any query on any index (it's just a set diff).
Stream the messages to the node that sent the query.