streamingfast/substreams

Implement a `set_sum` updatePolicy for stores

Closed this issue ยท 6 comments

PropellerHeads need this.

Definition of done done:

  • Reference documentation pushed to the official docs
  • An example repository/tutorial exists to demo the new feature.
  • The code has been released for the client, and rolled out server-side by at least StreamingFast for all supported protocols.
  • substreams-rs has been updated and published, to provide the new primitives
  • Proper change logs have been documented too.

By having a single byte prefix in an add store, we could know if someone has set a key, and then the merge operation could either SET of add with the previous segment's keys.

Example flow:

1 key1 add(5)
2 key1 add(5)
3 key1 add(5)

SNAPSHOT:
key1 a15

10 key1 add(5)
11 key1 set(50)
12 key1 add(5)
13 key1 set(25)
14 key1 add(5)

SNAPSHOT:
key1 s30

20 key1 add(5)

SNAPSHOT:
key1 a5  (there was no `set` in here, so we'll add `5` to the previous segment's `30`'s final output.)

This would help certain folks trying to use some set (when the liquidity of something is published in absolute terms), and track with greater precision all the adds and removals that are tracked in relative terms (amounts of burns, mints, etc..)

This feature would not be so complex to build, it would add an updatePolicy here and there.. but all those functions are very simple.

@abourget how would you feel about an "in-place" migration for all existing add stores, where we check the first byte for "a" or "s", and if it is not there, treat thee existing snapshot as an "a" and save the next snapshot accordingly?

@abourget or do you think it would be best to simply have a new store update policy called AddOrSet ?

what about SetSum ? add is a bit unfortunate.. set: and sum: prefixes because they will be visible in StoreDeltas arriving in some modules' inputs.. so a quick separator and a readable term would be good

The set+sum store cannot be parallelize since it is not an associativity operation.

We know that substreams requires associativity of operations to work, since it can "group" the work into shards,
i.e. (a+b)+(c+d) must be equal to (a+b+c) + (d)
given a set of operations on a store that allow "adds" and "sets": add(2) add(5) set(0) add(6)

The first grouping gives 13
= (+2, +5) + (set0, +6)
= +7 +6
= 13

But this grouping gives 6
(+2 +5 set0) + (+6)
= 0 + 6
= 6

(+2, +5) + (set0, +6) -> 6, not 13, because each time we hit a set, we erase whatever was accumulated from the left