ome/napari-ome-zarr

opening zarr file just the lowest pyramid level is loaded

Closed this issue · 12 comments

I have a zarr file made according to the High Content Screening metadata and each field of view has 5 pyramid levels.
When I open the file in napari it loads just the lowest level and in the debug I see this messages that I think could be useful

10:56:36 DEBUG img_pyramid_shapes: [(3, 19, 12960, 15360), (3, 19, 6480, 7680), (3, 19, 3240, 3840), (3, 19, 1620, 1920), (3, 19, 810, 960)] 10:56:36 DEBUG target_level: 4 10:56:36 DEBUG get_stitched_grid() level: 4, tile_shape: (3, 19, 810, 960)

From my understanding seems that this behavior is related to this https://github.com/ome/ome-zarr-py/blob/6dbd227f9af0cfe78b2b6e28b40c0c9b750d6a05/ome_zarr/reader.py#L502

Am I wrong ? Have you got any clue to solve the problem ?
Thank you very much

Hi @mfranzon. I think we've seen this elsewhere recently. Can you try explicitly invoking the napari-ome-zarr plugin?

napari --plugin napari-ome-zarr ...

@joshmoore already tested, I got the very same result. I paste here the full output of:
napari --plugin napari-ome-zarr <file.zarr> -vvv

16:17:38 DEBUG Created nested FSStore(file.zarr, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 DEBUG 0.4 matches None?
16:17:38 DEBUG 0.3 matches None?
16:17:38 DEBUG 0.2 matches None?
16:17:38 DEBUG V01:None v. 0.1
16:17:38 DEBUG treating file.zarr [zgroup] as Plate
16:17:38 INFO root_attr: plate
16:17:38 DEBUG {'acquisitions': [{'id': 0, 'name': 'file'}], 'columns': [{'name': '03'}], 'rows': [{'name': 'B'}], 'wells': [{'path': 'B/03'}]}
16:17:38 INFO plate_data: {'acquisitions': [{'id': 0, 'name': 'file'}], 'columns': [{'name': '03'}], 'rows': [{'name': 'B'}], 'wells': [{'path': 'B/03'}]}
16:17:38 DEBUG open(ZarrLocation(file.zarr/B/03))
16:17:38 DEBUG Created nested FSStore(file.zarr/B/03, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 DEBUG 0.4 matches None?
16:17:38 DEBUG 0.3 matches None?
16:17:38 DEBUG 0.2 matches None?
16:17:38 DEBUG V01:None v. 0.1
16:17:38 DEBUG treating file.zarr/B/03 [zgroup] as Well
16:17:38 INFO root_attr: well
16:17:38 DEBUG {'images': [{'path': '0'}], 'version': '0.3'}
16:17:38 INFO well_data: {'images': [{'path': '0'}], 'version': '0.3'}
16:17:38 DEBUG open(ZarrLocation(file.zarr/B/03/0))
16:17:38 DEBUG Created nested FSStore(file.zarr/B/03/0, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 DEBUG 0.4 matches 0.3?
16:17:38 DEBUG 0.3 matches 0.3?
16:17:38 WARNING version mismatch: detected:FormatV03, requested:FormatV04
16:17:38 DEBUG Created nested FSStore(file.zarr/B/03/0, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 DEBUG treating file.zarr/B/03/0 [zgroup] as Multiscales
16:17:38 INFO root_attr: multiscales
16:17:38 DEBUG [{'axes': [{'name': 'c', 'type': 'channel'}, {'name': 'z', 'type': 'space', 'unit': 'micrometer'}, {'name': 'y', 'type': 'space'}, {'name': 'x', 'type': 'space'}], 'datasets': [{'path': '0'}, {'path': '1'}, {'path': '2'}, {'path': '3'}, {'path': '4'}], 'version': '0.3'}]
16:17:38 INFO datasets [{'path': '0'}, {'path': '1'}, {'path': '2'}, {'path': '3'}, {'path': '4'}]
16:17:38 INFO resolution: 0
16:17:38 INFO - shape ('c', 'z', 'y', 'x') = (3, 19, 12960, 15360)
16:17:38 INFO - chunks = ['1', '1', '2160', '2560']
16:17:38 INFO - dtype = uint16
16:17:38 INFO resolution: 1
16:17:38 INFO - shape ('c', 'z', 'y', 'x') = (3, 19, 6480, 7680)
16:17:38 INFO - chunks = ['1', '1', '1080', '1280']
16:17:38 INFO - dtype = uint16
16:17:38 INFO resolution: 2
16:17:38 INFO - shape ('c', 'z', 'y', 'x') = (3, 19, 3240, 3840)
16:17:38 INFO - chunks = ['1', '1', '540', '640']
16:17:38 INFO - dtype = uint16
16:17:38 INFO resolution: 3
16:17:38 INFO - shape ('c', 'z', 'y', 'x') = (3, 19, 1620, 1920)
16:17:38 INFO - chunks = ['1', '1', '270', '320']
16:17:38 INFO - dtype = uint16
16:17:38 INFO resolution: 4
16:17:38 INFO - shape ('c', 'z', 'y', 'x') = (3, 19, 810, 960)
16:17:38 INFO - chunks = ['1', '1', '135', '160']
16:17:38 INFO - dtype = uint16
16:17:38 DEBUG open(ZarrLocation(file.zarr/B/03/0/labels))
16:17:38 DEBUG Created nested FSStore(file.zarr/B/03/0/labels, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 WARNING version mismatch: detected:FormatV04, requested:FormatV03
16:17:38 DEBUG Created nested FSStore(file.zarr/B/03/0/labels, r, {'dimension_separator': '/', 'normalize_keys': False})
16:17:38 DEBUG creating lazy_reader. row:0 col:0
16:17:38 DEBUG img_pyramid_shapes: [(3, 19, 12960, 15360), (3, 19, 6480, 7680), (3, 19, 3240, 3840), (3, 19, 1620, 1920), (3, 19, 810, 960)]
16:17:38 DEBUG target_level: 4
16:17:38 DEBUG get_stitched_grid() level: 4, tile_shape: (3, 19, 810, 960)
16:17:38 DEBUG treating file.zarr [zgroup] as ome-zarr
16:17:38 DEBUG returning file.zarr [zgroup]
16:17:38 DEBUG transforming file.zarr [zgroup]
16:17:38 DEBUG node.metadata: {'axes': [{'name': 'c', 'type': 'channel'}, {'name': 'z', 'type': 'space', 'unit': 'micrometer'}, {'name': 'y', 'type': 'space'}, {'name': 'x', 'type': 'space'}], 'metadata': {'plate': {'acquisitions': [{'id': 0, 'name': 'file'}], 'columns': [{'name': '03'}], 'rows': [{'name': 'B'}], 'wells': [{'path': 'B/03'}]}}}
16:17:38 DEBUG Transformed: ([dask.array<from-value, shape=(3, 19, 810, 960), dtype=uint16, chunksize=(3, 19, 810, 960), chunktype=numpy.ndarray>], {'channel_axis': 0, 'metadata': {'plate': {'acquisitions': [{'id': 0, 'name': 'file'}], 'columns': [{'name': '03'}], 'rows': [{'name': 'B'}], 'wells': [{'path': 'B/03'}]}}}, 'image')
16:17:38 DEBUG ImageSlice.__init__
16:17:38 DEBUG ImageSlice.__init__
16:17:38 DEBUG LOADING tile... B/03/0/4 with shape: (3, 19, 810, 960)
16:17:38 DEBUG ImageSlice.__init__

@mfranzon The original idea was to return a virtual (dask) pyramid of resolution levels for the plate, so that you could zoom in with napari and get higher resolution images.
But I never got this working - it always failed with seg-faults, so we settled on picking a single resolution.
There is the variable TARGET_SIZE = 1500 which could be increased so that a higher resolution level is picked. I guess this could be made to read an environment variable, to allow configuration, e.g:

$ MAX_PLATE_LENGTH=3000
$ napari --plugin napari-ome-zarr <file.zarr>

@will-moore thank you for the explanation. I have tried to export TARGET_SIZE and MAX_PLATE_LENGTH as variables than run napari as you suggested. Unfortunately the output does not change.
I found this walk-around to having the expected result. I changed a little https://github.com/ome/ome-zarr-py/blob/6dbd227f9af0cfe78b2b6e28b40c0c9b750d6a05/ome_zarr/reader.py#L502 this section of the reader as follow:

[...]
       # FIXME - if only returning a single stiched plate (not a pyramid)
        # need to decide optimal size. E.g. longest side < 1500
        TARGET_SIZE = 100000
        plate_width = self.column_count * size_x
        plate_height = self.row_count * size_y
        longest_side = max(plate_width, plate_height)
        target_level = []
        for level, shape in enumerate(well_spec.img_pyramid_shapes):
            plate_width = self.column_count * shape[-1]
            plate_height = self.row_count * shape[-2]
            longest_side = max(plate_width, plate_height)
            target_level.append(level)
            if longest_side >= TARGET_SIZE:
                break

        LOGGER.debug(f"target_level: {target_level}")

        pyramid = []

        # This should create a pyramid of levels, but causes seg faults!
        # for level in range(5):
        for level in target_level:

            tile_shape = well_spec.img_pyramid_shapes[level]
            lazy_plate = self.get_stitched_grid(level, tile_shape)
            pyramid.append(lazy_plate)
[...]

Main changes are :

  • increase the target size
  • target_level now is a list not an int
  • the if sentence, if longest side is greater or equal than target size, break

Sorry, to be clear MAX_PLATE_LENGTH=3000 etc environment variable isn't supported yet. That was just an idea for how we could pass such a value to the napari plugin (since we're not calling it directly via the command line, and I don't know if there's another way to pass arguments to napari plugins?).
In the code you'd need something like:

import os

TARGET_SIZE = 1500
max_length = os.getenv('MAX_PLATE_LENGTH')
if max_length is not None:
    TARGET_SIZE = int(max_length)

The previous logic was "starting with the largest resolution, pick the first resolution that is smaller than 1500".

You now have "starting with the largest resolution, pick all resolutions that are bigger than 100000".

Does this work?

HI @will-moore if I am not wrong my logic is something like :

  • if you have a dimension greater than a certain limit stop :
    if longest_side >= TARGET_SIZE: break
  • in the original file is, if longest side is smaller than a certain limit stop :
    if longest_side <= TARGET_SIZE: break

I think that it is the same thing for different TARGET_SIZE, for this I am agree with you to have a ENV variable.
I was also asking me what happens if we remove this check ? Why we need this threshold ?
Probably it is not a smart question but this deeper view in napari-ome-zarr plugin, helps me to understand better the logic about reading and writing zarr files.

Thank you very much!

The current logic is to pick a resolution level that is small enough to load a "preview" of the plate, without loading too much data (since the data may be remote).
So, for example the plate at https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/idr0094A/7751.zarr where the longest side has 12 images, of image pyramids are size 1080, 540, 270, 135, 67 pixels, the plate's sizes will be 12960, 6480, 3240, 1620, 804, so the size of 804 is the first one that is less than 1500. So, the level 4 pyramid is chosen.
You can see that with:

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/idr0094A/7751.zarr -vvv

Screenshot 2022-05-03 at 22 13 20

If the original idea of passing a virtual dask pyramid of resolutions to napari was working, then the highest resolution data wouldn't be loaded until needed (by zooming in).
But since that didn't work and I was forced to pick a single resolution, it needs to be not too high.

Hope that makes sense?

This completely makes sense. Thank you for your clear explanation.
Now I have a question, if I understand correctly the problem is that if we don't set a low threshold the risk is to load in too much data, right ? But, if we use dask delayed, don't we avoid the problem of handling too much data in memory ? It should lazy load the arrays just when you zoom-in in the image. Am I wrong ?

Thank you for your time to help me to figure out this!

That was my hope when I first wrote this code - We could generated a dask pyramid for the plate (similar to a pyramid for a big image) and the higher resolution tiles are only loaded when you zoom in.
But I was always getting Segmentation faults with that code. So I had to limit it to a single resolution.
In that case, even with dask, if you pick the highest resolution then when you zoom out to show the whole plate in napari, dask will try to load every tile (at full resolution).
That was with a much older version of napari, dask etc and a dask pyramid might work better now, but I haven't tried.

@will-moore : do you have a sense if that was more an upstream issue? And if so, did it make its way to the napari repos? Just wondering if we should @-mention anyone. Perhaps something has changed in the interim.

Oh wow! I just tried it and things are definitely much better.
With this diff:

$ git diff
diff --git a/ome_zarr/reader.py b/ome_zarr/reader.py
index 9059e24..5afdd7f 100644
--- a/ome_zarr/reader.py
+++ b/ome_zarr/reader.py
@@ -520,9 +520,9 @@ class Plate(Spec):
 
         # This should create a pyramid of levels, but causes seg faults!
         # for level in range(5):
-        for level in [target_level]:
-
-            tile_shape = well_spec.img_pyramid_shapes[level]
+        for level, tile_shape in enumerate(well_spec.img_pyramid_shapes):
+            print("level, tile_shape", level, tile_shape)
+            # tile_shape = well_spec.img_pyramid_shapes[level]
             lazy_plate = self.get_stitched_grid(level, tile_shape)
             pyramid.append(lazy_plate)

Loading the plate as above, I start loading low-resolution tiles and as I zoom in to parts of the plate, it loads higher resolutions with no seg-faults!

Screenshot 2022-05-04 at 09 16 21

I've opened a PR with that fix at ome/ome-zarr-py#195