Unify algorithms for on-the-fly analysis and hardware control in a repo to facilitate `smart microscopy`
edyoshikun opened this issue ยท 21 comments
Currently, the different teams at Biohub have created multiple tools and image-based hardware tunning to do the same task. This creates code duplication and inefficient project development as there is no need to completely reinvent the wheel. One suggestion that we proposed during our last copylot meeting was to find a repo/place where we can unify these pieces of code that can be easily discovered (i.e autofocus, autotracking, autoexposure, centroid detection, plate map position generator, etc).
These algorithms should be simple, perform a single task or functionality, and break down complex and repetitive tasks such that we can integrate these easily into other projects. For example, one algorithm is finding the focused slice given a 3D volume. and dragonfly's plate interpolation here which can take a 3D array and return a suggestion or take a list of positions and return a plane estimate respectively. In compmicro, we would like to use algorithms like autoexposure and autofocus by integrating this through the pycromanger hook functions, which basically allow on-the-fly processing and tunning of the microscope hardware.
A proposed solution:
- Unify these algorithms in a common repo (i.e
copylot/algorithms
) used among most groups in the Biohub. This will hopefully surface these pieces of code that are unknown to other groups and reduce code duplication. Additionally, keeping them in one repo facilitates access, code-review and adds momentum for other people to contribute.
This should also be discussed on the SIGs to see where do we see the best fit.
What are your thoughts?
@i-jey @talonchandler @ieivanov @rachel-banks @ziw-liu @AhmetCanSolak @YangLiujames @keithchev @JoOkuma @gregcourville
Please tag anyone else who might find this suggestion useful.
I'm all for reusing the infra whenever possible so we don't do the packaging job again if we don't have to. One problem I can think of is dependency management when accumulating scientific/CV python libraries (especially those with compiled extensions like CuPy and OpenCV). This can be mitigated with cautious planning and optional dependencies though.
I am also supportive of this idea. It will be helpful for everyone to have these algorithms in a common repo with a common structure, such that they can be easily reused.
I think we'll need to define which algorithms belong here and which don't. I think we should strive for these algorithms to be independent of the underlying microscope acquisition engine and control software. For example, autofocus algorithms which take a series of images and return the correct focal plane would be a good fit here. This algorithm will be independent of how the images are acquired and will not be responsible for moving the z stage to the correct focal plane - this will be done by the microscope control software based on the output of the autofocus algorithm.
Let's explicitly list out algorithms that may fit here, with their input and output, such that we can as a group decide what belongs and what doesn't. Based on the discussion so far:
- image-based autofocus algorithms
- input: image stack at different focal planes
- output: index of the image at the best focal plane
- autoexposure algorithms
- input and output as above
- single-shot and iterative autofocus and autoexposure algorithms
- input: image and associated metadata (z position or exposure time)
- output: instructions on what to try next (i.e. new z plane or exposure time settings); user can repeat the algorithm or quit if the output value (say z plane) is close enough to the starting one
- autotracking
- same idea as autofocus but applied in xy(z)
- FOV selection
- input: set of images
- output: score for each image, may be boolean (yes/no) or float (rank best to worst)
Algorithms I'm less confident that they belong here:
- sample (e.g. plate) position interpolation - probably?
- input: set of known points and set of query points
- output: interpolation at query points. this seems to be pretty straightforward interpolation; the useful bit on the dragonfly microscope is the interaction between the user and the algorithm, which may be hard to replicate entirely in copylot
- hardware-based autofocus - probably not?
- I think it would strongly depend on the underlying hardware
- centroid finding - not sure?
- is this just a helper function to autotracking? how is it better / different than algorithms in opencv?
Anything other algorithms you think may or may not belong here? We're looking for your input. If we house these algorithms in copylot
they would nicely integrate with image acquisition and hardware control through copylot
. However, they would also nicely integrate with micro-manager / pycromanager - controlled microscopes if we write them to be independent of the acquisition and hardware control software.
I am also in support of this idea, and agree with what Ivan has listed above.
Sample position interpolation I think would be applicable here.
Centroid finding is just a helper function for autotracking. We have a few helper functions written that are different methods of performing the tracking. I'm not familiar with the opencv version, @ieivanov do you know if it can do 3D? Or only 2D?
I really like the idea.
I think packing these algorithms into a library, and a single conda package that includes our other libraries (e.g. iohub) would be very useful because installing python packages is very complicated, especially on Windows.
IMO, the functions should have a somewhat standardized API and they could be implemented such that it works both with cupy and numpy without depending on cupy.
How often is opencv used?
From my experience is worth the effort to reimplement the opencv functions in python than deal with its installation, especially when using other qt-based packages (e.g. napari).
Thank you all for your input here and in offline conversations. I like that there is a general agreement on such collection of algorithm implementations. Here, I would like to summarize the main points everyone seems to agree on so far (if not, please comment below):
- We want to implement these algorithms agnostic to any acquisition engine and/or hardware, @AhmetCanSolak @keithchev @ieivanov
- Implementations should accept ArrayLikes and other required generic-typed parameters when needed, and should return ArrayLike, @ieivanov , @ziw-liu , @AhmetCanSolak , @JoOkuma
- coPylot packaging won't be compromised by introducing new default dependencies with compiled extensions (open to have such dependencies(cupy, etc.) as optional) @ziw-liu , @AhmetCanSolak , @JoOkuma
- List of initially desired algorithms: @ieivanov , @rachel-banks , @edyoshikun
-
- image-based autofocus algorithms
-
- autoexposure algorithms
-
- single-shot and iterative autofocus and autoexposure algorithms
-
- autotracking
-
- FOV selection
-
- sample (e.g. plate) position interpolation
I am curious about how often OpenCV is used, too. I have similar experience to @JoOkuma 's experience.
It's exciting that we all agree on this.
I mentioned OpenCV as an example of standard image analysis pipeline which some of the algorithms here may depend on. I'll let @talonchandler @edyoshikun and @rachel-banks say what dependencies the algorithms they've written may bring along to copylot
I'm in favor! I think it's a good decision to make these copylot/algorithms
"arrays in and arrays out" so that they can be used by different hardware and acquisition software.
No major dependencies on my end beyond numpy
. I support avoiding an opencv
dependency if possible.
I am curious about how often OpenCV is used, too. I have similar experience to @JoOkuma 's experience.
I used OpenCV as an example not because I am in favor of including it, but because I know it will cause problems and should be avoided if possible among other potentially problematic dependencies.
We use opencv only with the template matching method for autotracking, because it is much faster than the skimage version. If someone, (@JoOkuma maybe?) has a suggestion for another alternative, I'm happy to avoid it.
skimage
is I think the only other major dependency for autotracking.
@rachel-banks I don't know any alternatives. From a quick search, it seems they use the same algorithm (depending on what mode you're using on OpenCV).
I would try speeding skimage implementation, by processing directly in 3D (OpenCV is 2D only) and if gpu is available it could use cucim (skimage with cupy).
This is exciting! I think the algorithms summarized above cover a lot of "decisions" that need to be made during high throughput imaging, imaging developing embryos, and maintaining focus over long time series.
Another set of algorithms we could unify here is registration algorithms. The methods to compute registration (e.g., phase correlation, template matching) are currently used across microscopes and can benefit from collaborative algorithmic/numerical optimizations. I think that our near term needs are:
- register volumes across time - autotracker.
- register volumes across channels - multiple microscopes (dragonfly, falcon, hummingbird, mantis).
I strongly vote to make a separate repo/module to collect generic analysis algorithms. We should NOT mix generic analysis stuff with hardware control, unless a particular part of the hardware control code requires some very specific analysis.
@rafa-gomez-s and I were talking about this github issue. I agree with his vote that a small analysis library will be easier to build, manage, and reuse across different acquisition repos.
@royerloic @JoOkuma Loic and I discussed a lean library that enables online analysis (PSF estimation and other tasks discussed above) that acquisition engines of different microscopes can reuse. In this thread, we discussed whether such algorithms fit as a module within coPylot. At that time, the consensus was that coPylot should focus on hardware device adapters.
Two ideas for what to do:
- "chunk" the analysis algorithms into a module of coPylot or shrimPy, and use optional dependencies, e.g.,
pip install <copylot/shrimPy>[analysis])
. - "split" the analysis algorithms into a dedicated repo, just like we grouped all file i/o in
iohub
. Loic thought ofiphub
(image processing hub) as a possible name, which I like!
At the operational level, I've observed that chunking related code in a single repo promotes code reuse and regular review of code by peers. So, I naturally lean towards chunking. But, I think the number of methods in an online analysis repo is large enough that splitting also makes sense.
@ieivanov , @edyoshikun, @talonchandler , @JoOkuma, @royerloic your thoughts and vote?
I vote for creating a new repo dedicated to the analysis code - splitting. It's definitely a lot cleaner and makes the code more modular and reusable in different scenarios.
I'm in favor of splitting, but some thought should be put into this.
For example, a lot of image processing functionality already exists in mantis
, so would all of that be migrated to a new library? Otherwise, there might be a circular dependency, where online analysis
requires functionality from mantis
(e.g., deskew) and mantis
depends on online analysis
.
What are the good practices we should follow? Especially regarding the API, so online analysis
or hub
functions can be used with minimum effort. If the ergonomics are bad, it might be easier to duplicate the code.
I think both chunking and splitting can be used for the same purpose. For the chunking, similar to napari one could do pip install napari["all"]
vs pip install napari["pyqt"]
, which in our case could be acquisition
and analysis
. However, if we go this route, what would be like the 'required' default packages? These default packages are the ones that worry future users because of all the dependencies that have to be carried over if they depend on our package.
I lean towards the idea of splitting as @JoOkuma mentions based on the dependencies. I can also see us using optional dependencies depending on what sort of analysis we are doing and the packages required (i.e Viscy(torch), Cellpose (torch), Stardist(tensoflow), etc) I mention these because these are the 'bigger' imports and the ones that can cause the most issues in my experience.
As for the API, I can see us making at least these kinds of functions:
algorithms
array-in/array-out (i.e deskew, registration, denoising, segmentation)*workflows
getting from an image to interpretable results that can be used to alter hardware(i.e stabilization, autoexposure, autofocus). We have these algorithms currently living inshrimPy
that I am not sure if they should go to a separateipHub
.
- Comp micro's algorithms have been developed to process
czyx
volumes so that we can parallelize over T. Standardizing the shapes has let use have less redundancy in our code.
@JoOkuma I agree that good ergonomics is a key to the reuse of the library and circular dependencies are to be avoided!
Let's use iphub
as a shorthand for now before we figure out a name.
I like how @edyoshikun structured different types of methods.
It is intuitive to me that algorithms are all part of iphub
, device drivers are all part of coPylot
, and smart microscopy workflows are all part of shrimPy
when the microscope runs on Micro-Manager. The algorithms should expect and return arrays/tensors with dimensions ordered according to a standard, i.e., TCZYX.
@ieivanov @talonchandler your thoughts and votes?
I see a lot of value in an iphub
repository, and I'm in favor of moving algorithms that are useful across the Biohub from shrimPy
to the new repository.
It might be useful to decide together which GPU library iphub
should use. I vote for torch
since that's what we've been using lately, but I know @JoOkuma might prefer cupy
for memory-sharing purposes. Is torch
a concern @JoOkuma?
@talonchandler, yes, being a torch first is a concern for us.
We extensively use cupy
's unified memory when we require more space than the GPU RAM and other features like texture memory. However, we also use a torch when we need derivatives in dexpv2.
I think with additional functionality, we could reduce the overhead of managing torch
vs. cupy
when using GPU because they can share the same data, reference.
However, it will require some extra care when implementing additional functionalities.
In an ideal world, torch
would be compatible with the array API and we would not have to worry about this, but we can't count on this being done anytime soon.
One important thing we should consider is how much time should be put into this.
Managing packages across different teams and applications adds development overhead.
Having different compatible libraries without trying to have a single library for each purpose can be an alternative.