Thanks, Tomasz for inventing this kata.
A system delivers around one thousand events per second. Each Event has at least two attributes:
clientId
- we expect up to few events per second for one clientUUID
- globally unique
Consuming one event takes about 10 milliseconds. Design a consumer of such stream that:
- allows processing events in real time
- events related to one client should be processed sequentially and in order, i.e. you can not parallelize events for the same
clientId
- if duplicated
UUID
appeared within 10 seconds, drop it. Assume duplicates will not appear after 10 seconds
There are few important details in these requirements:
- 1000 events/s and 10 ms to consume one event. Clearly we need at least 10 concurrent consumers in order to consume in near real-time.
- Events have natural aggregate ID (
clientId
). During one second we can expect a few events for a given client and we are not allowed to process them concurrently or out of order. - We must somehow ignore duplicated messages, most likely by remembering all unique IDs in last 10 seconds. This gives about 10 thousand UUIDs to keep temporarily.