How to check if an aggregate ID exists?
enumag opened this issue · 28 comments
Very often when handling a command I need to check if an aggregate with the given ID exists. I'm mostly using AggregateRepository::getAggregateRoot()
for that purpose but in my opinion it is not a good method for this case. I don't need the aggregate itself or any of its events after all so getAggregateRoot
is actually doing a lot of useless work.
I was unable to find any more suitable method for this. Can you tell me how do you solve this in your applications? Do you think introducing a new method to AggregateRepository
would be a good idea? Any tip how to implement it? EventStore
doesn't seem to have a good method for this either (hasStream
doesn't help because I don't use one-stream-per-aggregate strategy).
Note: Using projections for this seemed like a good idea at first but it causes problems with our import where we need to run thousands of commands as quickly as possible. If every command has to wait until projections are updated to pass an existence check then the import gets really slow. Therefore I'd like to avoid using projections for these checks.
why not use an array of ids inmemory for import? or store them in memcached, redis, .. ?
@prolic While I could probably do that it feels more like a workaround than a proper solution.
Well, imho aggregate does not exist until there is at least one event generated by it. So, maybe count events for aggregate or something (sorry, I am not prooph professional, just subscribed to it out of curiosity :) )
Does proophs' event storage allow to count events? It would be simple way to add exists($id)
on aggregates repostories.
@enumag Do you have other use cases? If some import script that is running only once is the only use case, then we probably don't need that addition.
/cc @codeliner
@prolic Well I imagine it might also be a problem for some sagas that need to dispatch a lot of commands in sequence. In my case some monthly or yearly audits will certainly be a thing later on. Also it would remove the current problem of the GraphQL API where you can't use an ID of newly created aggregate in other mutations immediately because it's not yet propagated. GraphQL supports sending multiple mutations in one HTTP request but it wouldn't work with projection-based ID checks.
Right now I feel tempted to add a service that will simply use the same PDO connection as Prooph to check it directly in the database. It would be better to have a support for that in prooph though.
How do you handle ID existence in your applications?
Actually I never had a need to check for existence of an ID. I only have 2 cases:
- I want to append some events to an aggregate, hence it exists or I send a 404 if not.
- I want to create a new aggregate (due to uuid-usage, I don't need to check it)
What about cases when you need to add an aggregate that is somehow related to another aggregate and therefore holds its ID? I have such cases literally everywhere... Contract holds references to the contractors and the apartment the contract is for and also to accounting organization which holds some settings that are later used for accounting. I need to check all these ids to create the contract aggregate. Then there are lot of fees which need to be connected to a financial account but are sufficiently complex to be a separate aggregate.
@prolic What? I don't think you understand... For example I get a request to create a new Contract aggregate. The request contains IDs of the contractors and ID of the apartment (for simplification, there are several more in reality). So before I can create the new Contract aggregate I need to check that there is a Contact aggregate in the database for both of the given contractor IDs and that there is a Apartment aggregate with the given apartment ID. I can't allow a contract with nonexistent IDs to be created, can I?
In this case you have to load the complete aggregate anyway, because it could have state = DELETED or any other case that forbids creation of a contract.
While that certainly applies in some cases, it doesn't actually matter to us most of the time. States that forbid new relations are quite rare in our domain.
ping @codeliner your thoughts?
By the way this leeds me to an off topic question. In proophesor aggregates never had any getters. Is it a good practice to not have getters on aggregates? If that's the case then even if load an aggregate I would not be able to check its state.
sorry missed that discussion ....
Is it a good practice to not have getters on aggregates?
Yes! I know it is hard to achieve. The illusion of consistency pushes us in the direction of exposing aggregate state, because the aggregate is consistent and the read model is not.
But when exposing aggregate state you couple your aggregate with services, command handlers, ...
And sooner or later you're no longer able to refactor or remove the aggregate, because its getters are used everywhere in the codebase.
You have those relationships, hence the aggregate does exists.
Exactly! If you really want a system that processes thousands of commands in a couple of minutes you have to design your system in a "reactive way". See Reactive Manifesto
Especially the Resilient section is important here:
I can't allow a contract with nonexistent IDs to be created, can I?
Why not? What happens in that case? Can the business deal with it? A technical exception rejecting the command is not useful. Why are wrong IDs referenced in the first place?
A resilient system accepts the command CreateContract
and processes it. Then if you want to make sure that Contractor
and Apartment
are known in the system you either check that during read model projection or with a process manager. In both cases you access the read model to answer that question. Let's say Contractor
cannot be found.
First you can try again in a few minutes, maybe the read model is not up to date. If Contractor
is still not in the read model you should send the Contract
aggregate a command to mark the contract as invalid. You should also inform support and/or business about the issue so that it can be resolved manually.
I know this is a lot of stuff to code, but that's the way a reactive, elastic, resilient, message-driven system works. Such a system can be extremely robust and handle almost any issues without human intervention. The time you spent building it it saved in the long run when this system runs in production without or with very few issues.
In case you don't have the time for the solution described above:
Use EventStore::load() in your aggregate repository and limit the result to just one event of the aggregate (using MetadataMatcher). If one event is found your aggregate exists.
We can think about adding a EventStore::countEvents()
method like suggested by @alanbem
but that would be a BC break so only an idea for event-store v8.
@codeliner Thanks for the answer. You're mostly repeating what I previously read elsewhere. I definitely don't want to deny something I read from multiple sources written by people more experienced than I am. The thing is that whatever source I read so far none addressed my concerns about such approach. Can you answer them?
Yes! I know it is hard to achieve. The illusion of consistency pushes us in the direction of exposing aggregate state, because the aggregate is consistent and the read model is not.
But when exposing aggregate state you couple your aggregate with services, command handlers, ...
And sooner or later you're no longer able to refactor or remove the aggregate, because its getters are used everywhere in the codebase.
That's my concern exactly. Any tips how to solve it? You can't really sacrifice consistency, right?
Why not? What happens in that case? Can the business deal with it? A technical exception rejecting the command is not useful. Why are wrong IDs referenced in the first place?
Well the IDs are user input so I can't simply trust that they exist. And when it doesn't I need to tell that to the user and not just pretend that everything is ok. The user would be unaware that he did something wrong. So that exception is not only useful but necessary.
The system is an API so technical exception is just fine. The developer who works with the API will know what to do with it. In case it is a browser client it should translate the exception into something readable by the user (if the error is something he can fix).
A resilient system accepts the command CreateContract and processes it. Then if you want to make sure that Contractor and Apartment are known in the system you either check that during read model projection or with a process manager. In both cases you access the read model to answer that question. Let's say Contractor cannot be found.
First you can try again in a few minutes, maybe the read model is not up to date. If Contractor is still not in the read model you should send the Contract aggregate a command to mark the contract as invalid. You should also inform support and/or business about the issue so that it can be resolved manually.
Disregarding the fact that it's a lot of code to write, the main reason why I dislike it is that the system can't respond to an invalid command with an error. Instead it handles it and only founds out later that the command was wrong. So firstly the user won't know something went wrong and secondly if someone was to send invalid commands one after another (intentionally or by accident) the system would have to save them instead of denying them which would quickly make the EventStore full of absolutely useless data.
the main reason why I dislike it is that the system can't respond to an invalid command with an error.
Commands do not respond at all.
Instead it handles it and only founds out later that the command was wrong.
Which means the API user sending this command did not check any preconditions correctly.
So firstly the user won't know something went wrong
You can always use a different communication channel or implement a general "error handling" process manager.
(intentionally or by accident)
Intionally - no need to waste time with replying
accident - Why did the UI allow this then?
As commands are a fire and forget message it is particulary important that and API user (e.g. UI) checks all preconditions as good as possible. If there is a command failing due to business rule constraits, that is usually a bug in the UI (or API user in general). It pushes a lot of responsibility towards the API user yet usually this is enough for almost most of the cases.
The only thing you'll then need to be worried about (expect for technical issues) is true concurrency issues. Depending on your domain these are either so rare to accept these or you can detect and handle these specifically (e.g. send out an email .. ).
If you are insisting on waiting for a positive reply from the command (e.g. has been found) you can that too, by checking the side-channel mentioned before (e.g. insert handled commands in a table). But this pretty much defeats the purpose of CQRS and you might just want to go with event sourcing only.
On the other hand, loading an AR in the command handler is not overly expensive (in particular if you are using snapshotters). We have a few commands that indeed require a similar behavior too so we went for this route:
- try to load the information from the read model
- when no record is found, load respective aggregate
- if NO aggregate is found, throw not found exception (which is NOT returned to the user but published to a notification channel)
- if an aggregate IS found, throw a ConcurrencyException which makes the message to get redelivered and tried again (by then we start at the top again, until the read model caught up)
For some situations this is our handling of concucreny situations.
Yet the two important properties remain as they are: we don;t rely on the AR state or return values from the command handler. (Which is pretty much the cheap solution to what @codeliner suggested)
@fritz-gerneth I was hoping for a response from @codeliner but ok, I'll react to your post.
So you ARE actually checking that the IDs are valid by loading the aggregates, right? You just don't return the errors through direct response but through a different channel. That is an important detail for me to note and I do see some value in that but doesn't really matter in the context of my concerns.
Anyway this confuses me even more. @codeliner said to only check if the ID is valid in process manager or projection - either way it is AFTER the event was stored to EventStore. You however are checking them in command handler, meaning BEFORE the event is stored in EventStore. Which approach is correct then?
@codeliner Could you respond to my post above please?
Those are two things and should have made the limitations I imply more clearly:
- I was talking about how to find out if the AR I plan to execute the command on does not exist yet in the read model. This is solely for handling race / concurrency conditions within the same aggregate instance. If the AR I want to execute the command on is not found I throw an NotFoundException which is forward to the user.
- @codeliner says that an AR should not care if IDs of other ARs actually exist (e.g. through "relations"). For that use a process manager.
Oh ok. Then I can ask you about the main problems.
Firstly what exactly should I do in the process manager if the ARs ID does not exist? How should I handle projections? I mean only events that were already validated by the process manager should be projected, right?
Secondly if I don't check the existence of the related ARs then I have no guarantee that those ARs will be created later. That makes such events pretty much invalid and even storing such events seems useless. ES could easily become full of invalid unneeded data. How should I deal with that?
Thirdly if ES contains such invalid data, can I even consider it to be the "source of truth"?
I think I need a better insight to your workflow regarding storing events, checking for related IDs and projecting those events.
@codeliner 's point was: make error handling an explicit part of the business process. Ask yourself how this would be handled in a offline world:
- someone is submitting a (paper-based) form to a contractor with all his personal information attached.
- the contractor later on finds out the information provided is invalid (e.g. references an non-existent person, not attached)
- The contractor calls the person to clarify the incorrect information etc...
- If the contractor cannot reach the person they cancel the contract
As you can see there is a small process involved on how to handle incorrect data. Bring this into an application you can have similar. Don't treat those things as application errors. Those are business process errors and should be handled as such. In this scenario the error is resolved manually by the contractor.
Another scenario on this:
- someone books a seat at a conference, that seat is reserved
- later on it turns out the provided data is incorrect
- the seat reservation is canceled automatically (e.g. by a regular check / process manager)
Again, initially the seat is booked without validation of the relationship. Thinking in strong-enforced relations between ARs is contra-productive. Explicit relationships between ARs in general.
Secondly if I don't check the existence of the related ARs then I have no guarantee that those ARs will be created later. That makes such events pretty much invalid and even storing such events seems useless.
There is a reason why it is called 'eventual consistent'. If you are looking for strong consistency either the architecture is the wrong one (sometimes you just need strong consistency) or your aggregate roots are mis-aligned (the more often scenario).
Nevertheless, those are not useless events at all. By making error handling part of the business process those events have the same value as any other.
Secondly if I don't check the existence of the related ARs then I have no guarantee that those ARs will be created later.
If they have a strong dependency on those ARs your AR boundaries are wrong. Forget about relational database models and tables, about Objects in OOP, .. An aggregate root is much more, most of all a consistency boundary for a particular problem.
For the contract example above - if the personal information indeed must be consistent at the time of the form submission - don't reference it but make it an explicit part of the contract itself. You can find this in real life too:
- Either the required information is to be provided on a form (same "Form AR")
- Or you can hand in the form with the information attached (two ARs with a weak relation).
There is a huge difference in making it part of the form of referencing some other eventually-maybe-existing object.
Thirdly if ES contains such invalid data, can I even consider it to be the "source of truth"?
Yes. Invalid references are not invalid data as explained.
Summary: relations between ARs = weak relations. if you need strong relations your AR boundaries are wrong. both have it benefits and downsides. what you need depends on the model.
@enumag I did not answer because I have nothing to add to what @fritz-gerneth said. His answers are very detailed. Listen to him and try to follow the suggestions e.g. the comparison to a paper-based process.
@fritz-gerneth It all comes down to our ARs and bounded contexts being most likely wrong despite our efforts and refactorings. That's not really surprising considering that for most of the team it's a first time building a system like this. Changing error handling to what you described doesn't seem beneficial while the ARs are incorrect either.
The thing is that I have no idea how to do it better. We've read quite a few articles on the topic but it seems to be more experience-based problem. The old system with relational model has a lot of relations everywhere so it's difficult to tell where the separations should be made. And I honestly can't tell whether a given relation is "strong" or "weak" because there is no such distinction in normal relational database - just foreign keys everywhere.
Would you guys be willing to help me find the bounded contexts and aggregates if I describe the core of our model? Of course it's off topic in this issue - should I ask in proophsoftware/es-emergency-call for instance?
Would you guys be willing to help me find the bounded contexts and aggregates if I describe the core of our model? Of course it's off topic in this issue - should I ask in proophsoftware/es-emergency-call for instance?
Yes, let's try that. But it might take some time. Anyway, it could be nice reference for other teams struggling with the same problem. And yes please ask in the emergency-call repo.
Alright then, thank you. I'm closing this issue and will open a new one in the emergency repo. It will take some time to write it down and I also have a vacation coming up so it might take a week or two for me to open the issue.