python-data-acquisition/meta

Common API: Concurrency / async calls

campagnola opened this issue · 9 comments

How should we handle concurrency and asynchronous operation in a common API? Specifically:

(1) Should method calls be thread-safe? Pro: it is often much easier to implement thread safety at a low level when interacting with the device, rather than as an afterthought in a higher layer. Con: implementing thread safety correctly can be a challenge, makes the code more complex, could impact performance if done incorrectly, etc.

(2) Should method calls return futures? (all method calls, or just ones that are more likely to take a long time?) Pro: can make the common API much easier to use. Especially true for systems that need to coordinate action between multiple devices, and even more if you want to do that over IPC. Con: more work to write, more complex code.

Would this mean we have to make a choice about what particular libraries would be used? Threads for example are implemented in multiple different ways and it isn't clear that they all play nicely. For example will Python's threading library work with in the same process as PyQt threads?

If we want to implement thread safety, I would be more inclined to use Python's threading module (I can't think of a good reason for hardware control to depend on Qt at all). This is totally compatible with Qt's threads, though (and should be compatible with any other threads, as far as I know).

If you wrote the GUI in PyQt then it might seem like a good reason, which isn't to say that it is a good reason.

Perhaps the choice of which threads library to use could be left to the end user? So rather than directly using threading for example we'd call it indirectly through an intermediary, which should let the user could specify which thread library they wanted to use. This assumes that the different implementations of the threads all work basically the same, or similar enough that their differences could be abstracted out.

I can think of good reasons to use a threading intermediary (in acq4, for example, I use a custom mutex class for debugging deadlocks). However I think thread safety should be an internal implementation detail; I can't think of a reason that they should care how that works under the hood.

My experience with Qt threads and python threading threads seem to work together without a problem. Thus using the threading Locks might be a good base

I'd like to point at Lantz as an interesting example of building in asynchronous device calls: https://lantz.readthedocs.io/en/0.3/overview.html#effortless-asynchronous-get-and-set

One immediate benefit is that you can squeeze more performance out of situations where you have multiple ongoing and interdependent device tasks. Another benefit is that an asynchronous API lends itself much more easily to multiprocessing--ideally you can call the same method on either a local device object or a proxy to a remote device object and expect the same behavior. That said, it might be better to keep the low-level drivers as simple as possible and implement async calls at a higher level.

👍 to Lantz's approach using concurrent futures, from some previous high-level reading that approach is also what I would try to use first, as it seems most elegant/simple to use.

Violating any kind of lean approach to problem solving, I, personally, would go one step further and think about multi-processing safety. If you are acquiring at high-framerates (GigE cameras, for example), threads are going to become a bottleneck very soon. I wouldn't rely on Qt for this, since it is a complex machinery that data acquisition does not need per-se, and would keep Qt just at the highest level, for building a GUI.

Since drivers are sometimes already given, I build a model on them, the model is multi-processing safe, and the methods are thread-safe within that process. I use ZMQ to broadcast information in/out of the encapsulated models. The advantage is that I foresee a future where I can have a daemonic experiment running, and control it from a Jupyter notebook, another programming language, or, why not, another computer, through the network.

PS: Now that I think about this out loud, the concurrent futures would be a nice way of simplifying my job...