zilverline/sequent

Discussion: custom event stream identifier

Closed this issue · 16 comments

I would like to discuss the possibility of having custom (ie. non-UUID) event stream identifiers (read: aggregate ids).

Currently an event stream in Sequent is identified by the aggregate_id (which is of type uuid in stream_records). Could it be possible to also support other types (ie. any string)?

This can be useful for dealing with uniqueness requirements of AggregateRoots. We essentially have a very long (read: practically infinite) event stream, which we partition (read: create different aggregate roots) per 15 minutes of data (keeping each stream less than 100 events). As a workaround we currently enforce the uniqueness by using a unique index in the projection, however I don't feel this is the proper solution. Ideally i'd enforce the uniqueness within the domain (instead of in the projection) by having an aggregate id that is not a UUID and would like to be able to do something like this:

class RegisterPTUMinute < Sequent::Core::BaseCommand
  attrs aggregate_id: String, # contains something like "PTU.2022-01-16T00:00:00" instead of a UUID.
        ptu_minute: PTUMinute
end

The aggregate ID does not have to change after an aggregate has been created. This is different from the uniqueness requirement for for instance account emails (which are changeable).

I can't really think of a reason why this should be possible other than complicating the replaying of events. AFAIK the first n characters of aggregate ids are used as bucket identifiers for parallelisation. I suppose this could also be done in another way?

I've based this solution on this article: https://www.eventstore.com/blog/keep-your-streams-short-temporal-modelling-for-fast-reads-and-optimal-data-retention

Any ideas/suggestions?

lvonk commented

Hi Bob, interesting. However we are on the verge of improving the scalability of Sequent and that currently relies on aggregate_id being an UUID. So in short term this is not something we will work on (or accept as PR as it would make that work more difficult).
In somewhat longer term the way I see this being supported is by making Sequent more pluggable (this is something we are discussing when redesigning the new eventstore). Otherwise it is hard to take advantage of storage optimization of uuid native type in postgres. So in this scenario it is up to the user of Sequent which "event store" implementation they want.

Reading your requirement I am wondering, can't you use a timebased uuid (version 1) for this? Not sure if you can use it out-of-the-box, perhaps you will need to roll your own. Perhaps worth investigating.

Hi Bob, interesting. However we are on the verge of improving the scalability of Sequent and that currently relies on aggregate_id being an UUID. So in short term this is not something we will work on (or accept as PR as it would make that work more difficult). In somewhat longer term the way I see this being supported is by making Sequent more pluggable (this is something we are discussing when redesigning the new eventstore). Otherwise it is hard to take advantage of storage optimization of uuid native type in postgres. So in this scenario it is up to the user of Sequent which "event store" implementation they want.

That sounds good. Would you consider configuring in what event store an AggregateRoot is stored then?

Reading your requirement I am wondering, can't you use a timebased uuid (version 1) for this? Not sure if you can use it out-of-the-box, perhaps you will need to roll your own. Perhaps worth investigating.

That might be worth investigating yes. Thanks!

I found Ruby UUIDTools (https://github.com/sporkmonger/uuidtools) which allows to create a timestamp based v1 uuid, like so:

irb(main):016:0> time = Time.now
=> 2023-01-17 10:05:14.570297 +0100
irb(main):017:0> UUIDTools::UUID.timestamp_create(time)
=> #<UUID:0x433a0 UUID:0e09063a-9646-11ed-a151-acde48001122>
irb(main):018:0> UUIDTools::UUID.timestamp_create(time)
=> #<UUID:0x433b4 UUID:0e09063a-9646-11ed-a152-acde48001122>

However, this still creates a slightly different UUID based on the same timestamp (the gem claims to be RFC compliant).

How would you see something like this working? The UUIDs are also bound by MAC address, right? So even if the same timestamp (our 15 minutes block timestamp) would generate the same UUID on the same MAC address, it would probably be different on different (virtual) machines? And I would probably also have to validate the UUID based on the given timestamp, to prevent tampering?

lvonk commented

Maybe this helps: https://www.rfc-editor.org/rfc/rfc4122#section-4.3
Name based uuid are worth investigating, where your "name" consist of the your unique timestamp and the "name space" that code part. I am not sure if there is a default implementation or that you have to roll your own.
We are looking into something similar for aggregate_ids for tenants, so they all start with the same n bits for better locality when partitioning. But that is all still work in progress.

Ah thanks, that makes sense.

I've noticed that the UUIDTools library I mentioned has support for this with UUIDTools::UUID.sha1_create(namespace, name) (see https://github.com/sporkmonger/uuidtools/blob/main/lib/uuidtools.rb#L353).

So essentially we need to create a UUID constant for our namespace (read: Sequent::AggregateRoot subclass) and use the attribute(s) that make up the uniqueness as name (the timestamp in ISO 8601 format in our case).

Yeah this looks good:

irb(main):016:0> namespace = SecureRandom.uuid
=> "c493023f-05cb-450b-88e0-96309d35d0aa"
irb(main):017:0> UUIDTools::UUID.sha1_create(UUIDTools::UUID.parse(namespace), "2023-01-19T12:19:25Z")
=> #<UUID:0x433a0 UUID:a9b5f8d5-ab12-5ca3-b984-cf2b687c274d>
irb(main):018:0> UUIDTools::UUID.sha1_create(UUIDTools::UUID.parse(namespace), "2023-01-19T12:19:26Z")
=> #<UUID:0x433b4 UUID:cbd935c9-c173-52bd-a003-479119c15c13>
irb(main):019:0> UUIDTools::UUID.sha1_create(UUIDTools::UUID.parse(namespace), "2023-01-19T12:19:26Z")
=> #<UUID:0x433c8 UUID:cbd935c9-c173-52bd-a003-479119c15c13>

Even when I run this on a different machine (with a different MAC address) I get the same results 👍 .

lvonk commented

So for my understanding, are you then relying on the unique constraint in the event store in order to prevent the same aggregate being created?

That's what I am aiming for then yes. Or is that a wrong thought process?

lvonk commented

No, that makes sense, just checking :-).
I think we can also start using this for our own projects and reduces the need for these collection aggregates we have to ensure uniqueness for for instance bank accounts for a certain tenant. This can also be enforced by generating a uuid based on the iban and tenant. Interesting.

I think we can also start using this for our own projects and reduces the need for these collection aggregates we have to ensure uniqueness for for instance bank accounts for a certain tenant. This can also be enforced by generating a uuid based on the iban and tenant. Interesting.

Exactly!

lvonk commented

Would it be helpful to put this somewhere in the documentation as sort of "common patterns" section?

Yes that would be useful. I'd first like to actually get it running in production though (even though it conceptually works for me).

I just came across this article about PostgreSQL primary keys (and UUIDS), perhaps interesting as well: https://supabase.com/blog/choosing-a-postgres-primary-key

Apparently there are versions 6, 7 and 8 of UUID as well, which support better lexicographical sorting.

To ensure a unique constraints somewhere it may be more straightforward to add a new table with a unique constraint, and insert into this table within the same transaction/command handler where the aggregate is created.

Pseudo-code:

on RegisterUserCommand do |command|
  UserEmails.new(command.email).save!  # Unique constraint will trigger on duplicate email
  User.register(...) 
end

The UserEmails table should not be in the view schema, but can be in the Sequent schema. We could add a more generic version of this as a feature in Sequent, since it is a common problem.

lvonk commented

Yes, after giving this more thought I also realized that if you would ever get a clash in the "namespace approach" for what ever reason, your app is broken since you can't generate another uuid using that approach. Not sure how realistic that is, but it will depend how good the library is for generating uniqueness based on namespaces and names. So some sort of table, perhaps abstracted behind a service, might indeed be preferable imo.

lvonk commented

Closing since we will not add this to Sequent.