ome/napari-ome-zarr

Handle AnnData regions tables

will-moore opened this issue ยท 16 comments

With ome/ome-zarr-py#256 (work in progress) we hope to have AnnData tables available to napari-ome-zarr.

I've been using this to read "points" tables and display as tracks:
#81

So now I also want to handle "regions" tables in napari.

I've been looking at the https://github.com/kevinyamauchi/ome-ngff-tables-prototype/blob/main/examples/save_load_squidpy_segment.py example.

The logic as I follow it goes:

  • The labels layer is added first without any table data
  • Read the AnnData as dataframe (X and var) and combine it with the obs to give "features_table"
  • get all the label values from the label.data
  • get the label IDs from the features_table, e.g. cell_id column, and check if we're missing any label values
  • If so, duplicate a table row and assign a cell_id so that ALL label values have a corresponding row in the table
  • Sort the features_table by cell_id, so that the order of rows matches the order of label values in the label layer
  • Pick a random column to use to color the labels - convert to float and use the range to assign colors
  • Pass the features_table dataframe to napari as "features" on an NEW labels layer

Then you can use https://www.napari-hub.org/plugins/napari-properties-viewer to show a table of features for each label:

In this screenshot, I edited the code to use cell_id for color rather than a random column:

Screenshot 2023-03-17 at 10 19 12

I was thinking of doing something similar in napari-ome-zarr with the following differences:

  • Try not to duplicate the labels layer. Just add table info to a single labels layer.
  • Avoid duplicating rows of data. Either create "blank" rows that only have the cell_id OR possibly use a dictionary for features with cell_id as key instead of a dataframe (might be less performant)?
  • Don't pick a random column for color - sometimes this causes an error if a string column is picked from the obs data. Either just pick cell_id as above, or not add color at-all.

cc @kevinyamauchi @LucaMarconato @giovp

Very cool to see this update coming to napari-ome-zarr!

A few comments on the procedure you described. In napari_spatialdata we follow a similar approach to map the labels (matching the IDs, sorting the rows, etc), but we don't add extra rows for labels not in the table. We simply don't annotate them (for instance if the background is not in the table, then the background will be transparent, and we give a warning for labels that are not background and that are not in the table).

You can see this in action by 1) cloning spatialdata-sandbox and the spatialdata branch of napari_spatialdata, 2) going to the mibitof folder, 3) running first download.py and then to_zarr.py 4) and finally launching python -m napari_spatialdata view data.zarr.

The relevant parts of the code are here (matching the AnnData table with the labels) and here (creating the labels colored layer). One disclaimer, the code in that branch works but is messy, we want to fix it soon but we don't have time right now.

Another comment on using the table to store other types of geometries: I think that storing points, circles, etc in tables should be considered still as in a prototype phase.

In the ome-ngff-tables-prototype repo we tried to use the table to store points, circles and even (in a hacky way) polygons, as showed in the merfish example, even if the NGFF Table specs doesn't say anything about how to store coordinates in a table, but after these tests, we decided to use tables (in spatialdata) only for storing annotations but not spatial coordinates/geometries. We still use table to annotate features on circles, polygons (so that in a downstream analysis it doesn't matter if, say, cells, are represented as polygons or labels), but we don't allow tables to be in a coordinate system. We instead save the coordinates in .parquet files or ragged representations of polygons saved to Zarr, and we deal with io with libraries like geopandas, dask (dataframes) and dask-geopandas.

In the future we should all share our gained experiences when discussing how to store coordinates, geometries and ROIs in term of future NGFF specs.

Hi Luca, I just tried the spatialdata-sandbox/mibitof download and to_zarr.py....

(spatialdata) Williams-MacBook-Pro:spatialdata-sandbox wmoore$ cd mibitof/
(spatialdata) Williams-MacBook-Pro:mibitof wmoore$ python download.py 
(spatialdata) Williams-MacBook-Pro:mibitof wmoore$ python to_zarr.py 
Traceback (most recent call last):
  File "/Users/wmoore/Desktop/SPATIALDATA/spatialdata-sandbox/mibitof/to_zarr.py", line 6, in <module>
    from spatialdata import SpatialData
ImportError: cannot import name 'SpatialData' from 'spatialdata' (unknown location)

I have the following versions:

$ pip freeze | grep spatialdata
-e git+https://github.com/scverse/napari-spatialdata@2716a406f28dcf889474ab60671cde360c7327a2#egg=napari_spatialdata
-e git+ssh://git@github.com/scverse/spatialdata.git@8266a0f4a2d6a3ef7373b4f1f855de4a7b3c184c#egg=spatialdata
-e git+https://github.com/scverse/spatialdata-io@55ca02d52030757da62bc69802e34a15e28aa70d#egg=spatialdata_io
-e git+ssh://git@github.com/scverse/spatialdata-notebooks.git@6c54e94874d96450c7a1453f68d53445a7fb0ea0#egg=spatialdata_notebooks

Previously before I updated to latest spatialdata commit, I got:

Traceback (most recent call last):
  File "/Users/wmoore/Desktop/SPATIALDATA/spatialdata-sandbox/mibitof/to_zarr.py", line 7, in <module>
    from spatialdata.transformations import Identity
ModuleNotFoundError: No module named 'spatialdata.transformations'

when at commit:

commit 85a813b285d21fed51405f3fd5cd46f17b3017f9 (HEAD)
Merge: 848e063 72d4732
Author: LucaMarconato <2664412+LucaMarconato@users.noreply.github.com>
Date:   Fri Mar 10 00:35:03 2023 +0100
    Merge pull request #183 from scverse/some_docstrings

Any idea what branch/commit I need to use here for all those repos?

@LucaMarconato I don't know if you've seen ome/ngff#178 which discusses other table formats as part (or NOT) of the NGFF spec, including parquet?

@kkyoda is using NGFF AnnData tables to store tracks (e.g. see openssbd/bdz#2) and I have been looking to handle that in napari-ome-zarr to display tracks in napari - see #81

But if you prefer to store points & tracks in parquet then we're already seeing divergence on this and it would be good to converge so we don't waste time on different solutions.

What are the advantages with parquet for that data, compared with AnnData?
Maybe add to the discussion at ome/ngff#178?
I'm not at-all familiar with parquet, and I don't see any maintained JavaScript tools for reading the data, which is a shame but maybe not a blocker.

Any idea what branch/commit I need to use here for all those repos?

Hi, I think the problem is due to the fact that the version in pip is not the current one (we are not updating pip regularly since we haven't released yet). If you do an editable install of the main branch it should work.

Additionally, please mind the following two points:

  • if you install spatialdata-io or napari-spatialdata it could reinstall spatialdata from pip. I suggest to use editable installs also for those two repos and run pip install -e . spatialdata last, so that the other installations are not overriding it with the pip version.
  • in napari-spatialdata, the branch to use is spatialdata, we will have to do a major refactoring in that repo and sync it with main, but we don't have time at the moment. It's likely happening next month. Until the refactoring the performance for large dataset may also suffer, we will address them when working on this issue: scverse/napari-spatialdata#42.

Initially we drafted the table specification to also specify how to store points and other geometries ome/ngff#64 (comment). But later we decided to simplify it and focus solely on how to use tables to store annotations for labels ome/ngff@f8f2fd0.

This is because otherwise the table specification would have required to define transformation and coordinate systems for the coordinates, but the transforms specs are still being discussed, so we wanted to be decoupled from it and rather iterate later on.

As a result of not specifying how to store geometries in the table specification, we selected the storage method that best suited development and runtime efficiency. Currently, we use .parquet files for points and ragged arrays saved to Zarr for circles and polygons. The first makes lazy loading the data possible, the second is convenient for io with geopandas.

After the first iteration of development is complete, and the transformation specification is finalized, we would like to re-discuss how to store points and shapes, agree on a representation and change our io APIs.

I'm not installing anything via pypi.
I have checked-out:

https://github.com/scverse/napari-spatialdata/tree/spatialdata (2716a40 )
https://github.com/scverse/spatialdata-io (23be385)
https://github.com/scverse/spatialdata (8266a0f)
https://github.com/scverse/spatialdata-notebooks (d9cfe01)

So I get:

$ pip freeze | grep spatialdata
-e git+https://github.com/scverse/napari-spatialdata@2716a406f28dcf889474ab60671cde360c7327a2#egg=napari_spatialdata
-e git+ssh://git@github.com/scverse/spatialdata.git@8266a0f4a2d6a3ef7373b4f1f855de4a7b3c184c#egg=spatialdata
-e git+https://github.com/scverse/spatialdata-io@23be3852d650b34cc0ae5f4a91ea67acfec11839#egg=spatialdata_io
-e git+ssh://git@github.com/scverse/spatialdata-notebooks.git@d9cfe01cca258b45a34fad8e6db6dd5f5f594a25#egg=spatialdata_notebooks

I'm on this branch of spatialdata-sandbox:
* b7a728a (HEAD, origin/main, origin/HEAD) finished viz for data

I still see:

$ python to_zarr.py 
Traceback (most recent call last):
  File "/Users/wmoore/Desktop/SPATIALDATA/spatialdata-sandbox/mibitof/to_zarr.py", line 6, in <module>
    from spatialdata import SpatialData
ImportError: cannot import name 'SpatialData' from 'spatialdata' (unknown location)

I get the same import Error from a different location if I try:

python -m napari_spatialdata view data.zarr

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/Users/wmoore/Desktop/SPATIALDATA/napari-spatialdata/src/napari_spatialdata/__init__.py", line 15, in <module>
    from napari_spatialdata.interactive import Interactive
  File "/Users/wmoore/Desktop/SPATIALDATA/napari-spatialdata/src/napari_spatialdata/interactive.py", line 17, in <module>
    from spatialdata import SpatialData, get_axis_names
ImportError: cannot import name 'SpatialData' from 'spatialdata' (unknown location)

Strangely, looking at the local code for spatialdata it looks like the SpatialData class should be importable, so I'm not sure what's going on...

OK - so I created a new conda env and reinstalled everything... The installed versions look the same as above, but the import is working this time!

conda create -n spatialdata310 python=3.10
conda activate spatialdata310
cd spatialdata
pip install -e .
cd ../napari-spatialdata/
pip install -e .
cd ../spatialdata-io
pip install -e .
cd spatialdata-notebooks/
pip install -e .

 pip freeze | grep spatialdata
-e git+https://github.com/scverse/napari-spatialdata@2716a406f28dcf889474ab60671cde360c7327a2#egg=napari_spatialdata
spatialdata @ git+https://github.com/scverse/spatialdata.git@8266a0f4a2d6a3ef7373b4f1f855de4a7b3c184c
-e git+https://github.com/scverse/spatialdata-io@23be3852d650b34cc0ae5f4a91ea67acfec11839#egg=spatialdata_io
-e git+ssh://git@github.com/scverse/spatialdata-notebooks.git@d9cfe01cca258b45a34fad8e6db6dd5f5f594a25#egg=spatialdata_notebooks

$ python
Python 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from spatialdata import SpatialData
>>>

But...

$ cd spatialdata-sandbox/mibitof/
$ python download.py
$ python to_zarr.py

$ python -m spatialdata view data.zarr
/Users/wmoore/opt/anaconda3/envs/spatialdata310/lib/python3.10/site-packages/geopandas/_compat.py:123: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
/Users/wmoore/opt/anaconda3/envs/spatialdata310/lib/python3.10/site-packages/spatialdata/__init__.py:9: UserWarning: Geopandas was set to use PyGEOS, changing to shapely 2.0 with:

	geopandas.options.use_pygeos = True

If you intended to use PyGEOS, set the option to False.
  _check_geopandas_using_shapely()
Usage: python -m spatialdata [OPTIONS] COMMAND [ARGS]...
Try 'python -m spatialdata --help' for help.

Error: No such command 'view'.

Solved via chat, it was a typo in the info showed in print(), the correct command is python -m napari_spatialdata view data.zarr

Trying that...

$ python -m napari_spatialdata view data.zarr

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata310/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata310/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/wmoore/opt/anaconda3/envs/spatialdata310/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/Users/wmoore/Desktop/SPATIALDATA/napari-spatialdata/src/napari_spatialdata/__init__.py", line 15, in <module>
    from napari_spatialdata.interactive import Interactive
  File "/Users/wmoore/Desktop/SPATIALDATA/napari-spatialdata/src/napari_spatialdata/interactive.py", line 17, in <module>
    from spatialdata import SpatialData, get_axis_names
ImportError: cannot import name 'get_axis_names' from 'spatialdata' (/Users/wmoore/Desktop/SPATIALDATA/spatialdata/src/spatialdata/__init__.py)

Can you please do import spatialdata; print(spatialdata.__path__)? I think that it will give you a path in site-packages and not the one in which you cloned the repo, because I think that this command

cd ../spatialdata-io
pip install -e .

tries to override the editable installation even if it is there.

Ah! There was a problem introduced by a recent pr. I will fix it. Please notice that (at least on my machine), running the commands described in #85 (comment) will reinstall spatialdata from Github. To restore the editable install I had to do:

pip uninstall spatialdata
# cd spatialdata repo
pip install -e .

Yes, I had already found that I needed to re-install from source! Strange...

Screenshot 2023-04-03 at 08 25 59

Great that it works! One extra info, we uploaded (and we are keeping up-to-date) the various datasets already converted to NGFF in a S3 storage; you can find the URLs here.

We haven't tested loading the data from the cloud yet (I think we have to fix the consolidated metadata). I'll keep you posted, but if you want to also experiment, any feedback would be appreciated ๐Ÿ˜Š