ylorph/RandomThoughts

Expectations for a Store: Should explicitly mention subs not missing or reordering events

Opened this issue · 2 comments

in https://github.com/ylorph/RandomThoughts/blob/master/2019.08.09_expectations_for_an_event_store.md

There is:

ability to have subscription

But there is nothing that calls out that it should guarantee never to miss an event if you have a live subscription and there are multiple writers.

Until today I thought this should go without saying (and it's great that this doc is pithy and not full of legalese), but it has come to my attention that in some cases that delivering that 100% and/or documenting the likelihood of the absence of such a complete guarantee is ins some way debatable

I believe CosmosDB ChangeFeed, DynamoDB Streams, MessageDb category subscriptions guarantee this and document it as so (some digging may be required, I dont have citations). I believe ESDB should guarantee this, but there are far better qualified people than me to make the claim. For others, it gets more confusing; if this list mentioned it explicitly as being significant, it would elevate the need for stores to consider and/or answer the question.


Clarification re DynamoDB streams:

  • at the time I wrote this, I thought it provided a guarantee that each update under a given Partition would be delivered in the order it happened
  • it turns out that the guarantee is only for relative order of insert/updates/deletes to an individual Item, i.e. per item identified by a Sort key within that Partition. Equinox.DynamoStore had a fix applied that compensates for this. I'm not aware of a source-available implementation of an Event Store on Dynamo DB that is not affected by this
ylorph commented

yes , missing event detection / out of order can be done if you have a monotically increasing number on the streams you read
it's a matter of keeping track of the last processed event number on the consumer side

Some message broker out there claim to guarantee that without that trick.
Though they guarantee that on up to just before it enters the consumer: so kind of hand wavy claim .( because the consumer might crash)
Especially that most people don't dig that far in those explanations.

so detection is the best you can do , and for that the easiest solution is each event in a stream has that increasing number.
to make it work both the server & the consumer need to play along and follow some rules.

I agree that every event in a stream needs a monotonically increasing number relative to the other items in the stream

Once you have that, you can describe the guarantees, i.e.:

  1. no gaps, i.e. after event N, I might see many events <=N. eventually I will see N+1. I will NEVER be presented with event N+2
  2. items getting reordered on some bus can cause such gaps (e.g. event per item fed through DDB streams)

The point is more about broken impls

  • if you put events into individual items in DynamoDB, DDB streams specifically does not guarantee the relative order coming out the other end
  • if your SQL store serializes writes at category level, you never need to check for gaps, either in code or in any other way. If on the other hand there is only a "good enough" compensation for the possibility, a la SqlStreamStore, there are implications:
    1. it might have gone wrong
    2. if you had a record of the last one you processed you could ignore < expected next index and fail on > expected next index
    3. if you don't have a record, all you can do is go checking for gaps in sequence numbers, checking logs etc

If you are relying on a known broken impl, you can make provision:

  • add a check for gaps and fail fast, figure out some way to remediate if/when it happens
  • determine that your handler will be fine with a gap (either a short term one due to out of order delivery, or a permanent one due to e.g. SQL phantom write overlaps)

If OTOH you have a guarantee (based on reasoning about how the impl works, or something like a Jepsen test), you can:

  1. just write your system without triple checking things, writing lots of confusing and long-winded just in case code and falling into programming by coincidence
  2. spend less time analyzing incidents because you can more quickly rule out missing or out of order deliveries - ultimately you might still go and triple check things in the end, but it will go to the back of a long list of things - you trust that it should just work

To put it another way: for stores and projection loops backed by MessageDB, EventStoreDB or CosmosDB, I would never write gap checking logic, or let anyone else do it.

I thought I had the same guarantee for Equinox.DynamoStore by @epNickColeman discovered there was a weakness. That has been fixed so I would hold the same position of not writing gap check logic there either.

Each of those stores have a monotonically increasing event index at stream level (and its presence and there not being gaps in it should be a universal requirement; it's not a lot to ask)

For other stores that have public code that I'm aware of which don't guarantee both delivery and ordering (there are multiple SQL backed, one Dynamo DB backed I'm aware of; I'm not going to list them - the goal here is that others can discuss it later wrt an agreed baseline expectation definition), I would expect/need:
a) that they can furnish a stream level position so I can check for a gap/out of order delivery (that's pretty normal)
b) that there could be a potential gap or out of order delivery

does the 'spec' demand no gaps in the event sequence numbers at stream level? If not, I think it should as a) gap detection and/or out of order delivery checks require a guarantee of no gaps in the source data per the above b) I don't think there are many that don't provide it, and it should always be realizable