shiblon/entroq

Document Recipes for strict ordering of heterogeneous tasks

Opened this issue · 0 comments

Use case: multiple "users" (tenants, so groups of users) each have strong requirements that requests be handled in precisely the order submitted.

An interesting recipe for this follows, and we should document it with graphs and other helps to make it clear.

Storage Types Needed

Two storage mechanisms:

  • Ordered storage
  • EntroQ

Ordered storage is anything that has the following capabilities:

  • Append to tail
  • Get head
  • Delete head

So, a Postgres table where tasks are sorted by monotonically increasing object ID or timestamp would work fine for this case.

You can do this with a single worker type that understands how to accomplish the work for any kind of task, or we can get into a microservices approach later on.

Intake

To get tasks created per tenant, the ordered storage mechanism is basically the means of communicating between UI and backend: UI creates tasks and pushes them into ordered storage, somehow. Given that both UI and backend are going to manipulate this storage, using something like a database is advantageous over flat files because flat files would require an outside locking mechanism.

Single Worker Solution

Set up a single queue in EntroQ. We'll call it "tenants". When a new tenant comes online, a single task for that tenant is added to this queue. This task lives forever from this point forward. In other words, the queue should always contain one task per valid tenant.

A "tenant worker" type is created that does the following each time a task is successfully claimed from the "tenants" queue:

  • Go to ordered storage for the tenant represented in the claimed task.
  • Get the head item.
  • Do what it says to do.
  • If successful, delete the head item from the task list.
  • Set the claimed task's Arrival Time to "now".

And that's it - you are basically using EntroQ as a lock per tenant. Tenants can all make progress in parallel, but each tenant is guaranteed a strict ordering of task completion.

NOTE: there are some complexities here around tenant management. If a single process can be responsible for managing tenants in this queue, then you can get away with some otherwise questionable practices, like searching the queue, checking for duplicates, deleting tasks that are currently claimed, etc. System-level work like this is allowable even when correctness is important, but great care must be taken to ensure that only one process does this kind of work, to avoid deletion/addition races

Microservice Version

There may be good reasons to not want the tenant worker to be built to actually carry out the work for every possible task type. In that case, you can just have it know how to send requests for each type to be completed by another service.

If you want to use EntroQ for async comms in this scenario, which is highly recommended for fault tolerance, the recipe has some additional components:

Queues:

  • tenants (still)
  • request_A, request_B, request_C, etc., for task types A, B, C, etc.
  • ephemeral response queues (see below)

Workers:

  • tenant worker
  • A worker, B worker, C worker, etc., for reading from their respective request queues.

Tenant worker flow:

  • Go to ordered storage, get head task for currently claimed tenant
  • Create a response queue name, e.g., "tenant1_response_25829234832234" - note the random suffix, this queue should not already exist
  • Create an EntroQ task with metadata indicating how to complete it, and package the response queue name within it. We'll call that field "respQ"
  • Claim respQ - this blocks indefinitely (see below for how to use a timeout in this case)
  • If successful, delete head item in ordered storage for this tenant
  • Set AT on tenant task to now (we're all done with the lock)

Microservice Worker:

  • Do the claimed task work.
  • Create a response task, push into the queue specified in respQ (see above)

What does it mean to add a timeout on the "claim respQ" part above? It means that you expect the task to be fully completed easily within that time, and that if it isn't, something is wrong and you should throw the whole thing away. That means you do not delete the head task item in ordered storage if you want to retry, you just leave it there, release the lock by setting AT to "now", and give up. The system will pick it up again momentarily.

Notes on Idempotence

When writing to a database within a worker, it is best practice to have the task describe "how the database should look after finishing" instead of "how to incrementally change whatever is there". Describe end state, not diffs.

When writing files, you want to do the following:

  • Always write to a timestamped file name.
  • Always write to a "partial" file name.
  • Only when finished should the file name look real.
  • Use a separate process to garbage collect old partial files.

Example:

  • Open "my-output-file-20230630-110800.partial" for writing
  • Start writing
  • Finish writing
  • Rename to "my-output-file-20230630-110800"

Now, by just doing a file listing, you can see the state of every file, you can tell which ones are old and corrupt (because maybe the power got cut during a write, which happens more than you might think!), you can see which ones are finished, and you will never have two workers writing bytes to the same file at the same time, which can easily happen if one of the workers lost track of EntroQ and another picked up its task, but both still had filesystem access. Always write unique filenames, and then indicate them in the response instead of just assuming they'll be called something standard. It's the only way to really be safe.

@zahmadsaleem FYI