This is a small library that uses dask to compute regionprops in parallel.
In addition to parallelization, it adds a few features/specializations on top of
the scikit-image
regionprops implementation.
dask_regionprops
will return a dask dataframe containing the region properties as columns.- Arrays can be numpy or dask arrays as well as xarray DataArrays backed by either array libary.
- ND arrays get processed as a sequence of 2D arrays. Typically we assume that the last two dimenions contain the images and the leading dimensions will be looped over.
- In the ND case, the result dataframe will have columns that map each label
pip install dask-regionprops
You can use regionprops as a nearly drop-in replacement for scikit-image's regionprops.
from dask_regionprops import regionprops
props = regionprops(data)
I wrote this library to help analyze microscopy datasets. After segmentation I typically have a 4D xarray DataArray
where the dimensions are (Position, Time, Y, X). Importantly, I reuse label values between positions but not times
so for all of the time-points in position S
, the region labelled r
should refer to the same cell. Hopefully this
motivated the decision to return the leading dimensions in the dataframe. For instance if you want to get the properties
of a cell 5 in position 2 you could do something like:
single_cell_props = props.loc[(props["dim-0"]==2)&(props["label"]==5)]
If you are a more advanced pandas user, and you want to do this sort of analysis for many cells,
you might consider using the leading dimensions and region labels as a multiindex
to more efficiently
access the data in this way.
Finally, a useful downstream application is to use the region properties as features for a classifer or maybe even a clustering algorithm. I have personally input labelled regions and the corresponding fluorescence images to identify progression through the cell cycle.
Please feel free to open an issue or pull-request if you have questions or improvements for this library.