Worker thread refactoring
Closed this issue · 7 comments
Here I will discuss the refactoring of the worker thread system used by Lightbus. I want to refactor this because:
- It has specific cases in which it can fail. I wish to avoid these cases. Specifically
WorkerNotReady
andWorkerDeadlock
errors. - I want to remove the
ThreadSerializedTask
as this constrictsasyncio
(which thread safety will allow us to do) - I want the Lightbus client to be thread safe
Requirements
Any solution must provide the following:
- Thread-safety
- Prevention of deadlock cases
- Minimise other error cases
- Allow user callables to start their own event loops
- Does not complect different system components
How to achieve this
Thread-safety
Thread safety requires that access to global resources is carefully controlled to prevent race conditions. This either requires the removal of global state or – where this is not possible – marshalling of access to it. For example:
- Network connections (e.g Redis connections) – This is global state, and only one thread should be able to read from and write to the connection at any one time. We cannot remove this global state.
- Internal bus client state (e.g. listeners, schema).
Thread safety can be provided though mechanisms such as locking or queues.
Prevention of deadlock cases
Careful system design is the first step here. If A requires B and B requires A, then no amount of locking primitives will prevent that deadlocking. Rather, removing of blocking (in the threading sense) requirements is an important step.
Queues will likely be the preferable coordination mechanism as they can help keep different system components distinct (not complected).
Minimise other error cases
Hopefully this should fall out of a good design
Does not complect different system components
Again, thoughtful systems design, and isolation of system components.
How do we deal with exceptions in sub-tasks?
So far we've been passing around a BusClient
instance and calling the shutdown_server
method upon error. This then puts a flag in the BusClient
s _server_shutdown_queue
. The handler for this stops the loop, which then does the cleanup.
This involves passing the busclient everywhere, which is aweful.
In the test implementation we now have the command/invoker pattern. This passes commands in a single direction. Options:
- Invoker can also pass errors in the opposite direction
- Invoker can un-gradefully handle errors on the grounds that they should be handled outside the invoker. For example, a transport should catch an error and pass it to the
ClientInvoker
, rather than expecting theTrasportInvoker
to work in reverse.
This is ongoing and I’m working on it over the holiday period. The refactoring is largely done and I’m working on getting the tests sorted, plus a number of ‘TODO’s I’ve left in my wake.
Tests now passing. TODO
s next
@apollo13 I've just implemented a first pass at the on_error
parameter (5b0fc13, 89c57df, c8cabbd)
What I didn't say earlier is: I actually think it makes much more sense for this to be a parameter to the listener, rather than a config option. How each service wants to handle errors is a choice for that service. There is no reason that every service should treat event-handling errors the same for a given API. Thanks for the good suggestion 👍
I think this is now largely complete. I'll likely merge an do an 0.10.0 alpha1 release soon.
Merged in commit a8c106d