python-data-acquisition/meta

Discussion: must-have features for a data acquisition platform

campagnola opened this issue · 4 comments

Although this project began with relatively modest goals (shared low-level hardware drivers), it is often necessary to think about the bigger picture in order to settle on the smaller details. To in this thread let's collect ideas about what an ideal data acquisition platform would look like. I am not interested in applications here, but just ideal infrastructure that would allow one to most easily write any application they wanted (if such a thing can even exist).

We now have a mixed audience here including optical microscopy, robotics, electrophysiology, particle physics, etc. One question we could try to answer here is whether it even makes sense for all of these fields to attempt a shared acquisition platform.

Must-have features for a data acquisition platform (these are things I've seen so far in discussions here; will update this list as needed):

  1. A hardware abstraction layer that provides a standard API for similar device types.
  2. Distributed architecture: should be able to coordinate devices that exist in the same thread, different threads, different processes, or different machines.
    a. Object proxying so that all possibilities in (2) can be used with only minimal changes to the application code.
  3. Coordinate system modeling - keep track of where your hardware is in physical space, and make it easy to transform data between reference frames.
  4. Resource locking / sharing - a way to have multiple synchronous procedures safely accessing the same hardware.
  5. Acquisition engine(s) - systems that automate common workflows for configuring / synchronizing hardware and recording results. It's likely that multiple acquisition engines would be needed to cover a variety of different workflows.
  6. Basic common data / metadata structures like annotated arrays, units, etc. Probably few of these need to be implemented by the platform; we could just agree on a standard set of tools.

Note: these are not features that all users would want; probably most users will only need a subset. The list should be complete, though, to avoid "I can't use this framework because it lacks X critical feature" for the most common use cases. Features should also be independent and optional if possible to avoid bloat when they are not needed.

Continuing the converstation from #2

One of the main things we're trying to accomplish here is a shared library of hardware drivers. A question that keeps coming up is whether the low-level drivers need to know anything about the higher-level infrastructure they will participate in (like, for example, a multiprocess message-passing system).

@VolkerH, @bilderbuchi (and anyone else with an opinion here) I am curious to hear whether you think the low-level drivers need to be written with message-passing in mind, or if that can be just as easily (and perhaps more cleanly) implemented in a higher layer. We've discussed this a bit already here and here.

Good question, and I don't think I have an answer. My gut feeling is that the driver just needs to provide low-level functionality and can be incorporated into a publish/subscribe using wrappers.
In a sense, when you are working with proprietary devices where you get perhaps a .dll for device drivers that's what one does anyway, write a wrapper around it.

I think what a framework in Python could provide is some decorators that make it easy to turn a bare-bones driver that has some inputs and outputs and connect those to a publish and subscribe-type message passing interface.

I (along with @untzag) have been lurking here a while, while working on our own project: yaq.
Big fan of what you all are doing here.
Your goals here remind me of yaq, so I can't resist highlighting the similarities.
Like (I imagine) many of you I feel somewhat guilty inventing yet another standard, but at the same time I like our technology and we're already invested and growing...

hardware abstraction layer

yaq uses traits to enforce standard apis where possible, but also allows unique methods for individual hardware.
Traits are nothing more than collections of method signatures, configuration parameters, and state variables that are expected to exist.

distributed architecture

yaq is a daemon architecture using a msgpack based RPC over TCP/IP (specification here).
yaq daemons run in different processes, and can be run on different machines if desired.
Daemons are developed and distribued separately (which speaks to concerns in #3)
I'm also excited about writing daemons in different languages from each-other and from clients (I have a lot of LabVIEW-for-GUI friends).

coordinate system modeling

yaq has nothing like this, although of course it could be added client-side

resource locking / sharing

yaq daemons natively support multiple connections.
Locking was considered, and could be implemented if needed, though I am skeptical that it's necessary for the time being.

acquisition engine(s)

Any variety of acquisition engines could be built on top of yaq.
We do provide a generic Python client which makes using daemons in scripts relatively seamless.
For our instruments we have designed instrument-specific graphical clients in a pretty manual way that, honestly, could be a lot easier.
See for example PyCMDS.
We also hope to build bridges to other projects, including ophyd-yaq (credit @tacawell).

For context: we're chemists, and right now we're using yaq "in production" to drive custom instruments and reactors of different scales throughout our department.

Coordinate system modeling - keep track of where your hardware is in physical space, and make it easy to transform data between reference frames.

I'm not sure this is a universal thing - a reference frame would not be useful e.g. to a power meter or a function generator. The only device class that I see off the top of my hat as making use of this is motors/movers (e.g. translation stages), and those could have a custom attribute (e.g. via Mixin class) for the coordinate system.
I'm wondering though, if coordinate system info would not be better placed in the setup/station/experiment description section, as it is not an inherent property of the motor (driver), but of where it is mounted, no, as opposed to the translation stage position value.

Acquisition engine(s)

Maybe this should be separated from the driver part - as you say already there would probably be diverging needs. Roughly, I can think of a division into separate projects/repos along the lines of

  • Acquisition: drivers, I/O, basic interaction, common base for all, what you want to do first here (?)
  • Orchestration: workflows; station, experiment and procedure configuration and execution scheme
  • Operation/Visualisation: drive and interact with the experiment, probably GUI clients, GUI vs. notebook vs. scripts could all be accommodated on this layer

These layers would interact via a defined API/scheme, so custom solutions are also possible.
I hope that if you play things right,
the first layer would be universally useful for as many people as possible and would collect a big driver database to attract critical mass of contributors/maintainers;
the second layer would be useful for many, maybe there are a couple of fundamentally different ways to do this (python scripts vs. markup definition vs. something else), that still would use a common way of interaction with above and below layers;
the third layer is where people would really differentiate into usecases - microscopes vs particle physics vs. mechanical engineering vs. ...; offering a common GUI building toolkit (e.g. based on Qt) could help standardize things.

  • I would see unit support as pretty important nowadays

I'm not sure [coordinate system modeling] is a universal thing

Agreed; in acq4 this is implemented as a mixin class used by only optical/mechanical devices. I include it here because it's infrastructure; it lives somewhere between the device drivers and the application. We could make the same argument for most of the items in the list, though--some users won't need device locking, or distributed control, etc. There are even cases where a hardware abstraction layer will get in your way. Ideally, each of these features would be optional (maybe even separate packages), and lightweight enough that they don't dissuade new users from adopting the platform. Can't tell you how many times I've heard "this platform is too bloated" only to be followed a few years later by "we present a new flexible, extensible, totally generic framework for doing everything you ever dreamed".

Maybe [acquisition engines] should be separated from the driver part

I should have been more clear: everything on this list I would consider to be independent from the device drivers.

I would see unit support as pretty important nowadays

In acq4 all values are stored as unscaled SI units, so no special help was needed there. But agreed it's common in some domains; I'll add that to the list.