bugs in xenium io code and reading of additional images

Dear all,

I'm working with spatialdata to read xenium data and found two minor bugs in io.xenium():

In line 139 it should be XeniumKeys.MORPHOLOGY_FOCUS_FILE instead of MIP:

spatialdata-io/src/spatialdata_io/readers/xenium.py

Lines 136 to 142 in f5f7c87

    
           if morphology_focus: 
        
               images["morphology_focus"] = _get_images( 
        
                   path, 
        
                   XeniumKeys.MORPHOLOGY_MIP_FILE, 
        
                   specs, 
        
                   imread_kwargs, 
        
                   image_models_kwargs,

And the np.testing.assert_array_equal() line here made also problems:

spatialdata-io/src/spatialdata_io/readers/xenium.py

Line 189 in f5f7c87

np.testing.assert_array_equal(metadata.cell_id.astype(str).values, adata.obs_names.values)

read_parquet() resulted in byte strings which gave then an error because they were not identical to cell_id in adata.obs_names when calling metadata.cell_id.astype(str).values. To correct for this np.testing.assert_array_equal(metadata.cell_id.str.decode("utf-8").values, adata.obs_names.values) worked for me.

And our dataset also includes IHC & H&E stainings of the sections which would be great to include into the read process of spatialdata. I found this feature missing and think in the future many groups will have additional stainings. Are you planning to include this in the future?

I implemented it in my fork with following code:

  if additional_images is not None:
      for img_name in additional_images:
          images[img_name.split(".")[0]] = _get_images(
              path,
              img_name,
              specs,
              imread_kwargs,
              image_models_kwargs,
              rgb=True
          )

But since these images are RGB and not grayscale I had to include rgb=True/False into _get_images() to bring them into the correct shape ("c", "y", "x"):

def _get_images(
    path: Path,
    file: str,
    specs: dict[str, Any],
    imread_kwargs: Mapping[str, Any] = MappingProxyType({}),
    image_models_kwargs: Mapping[str, Any] = MappingProxyType({}),
    rgb: bool = False
) -> SpatialImage | MultiscaleSpatialImage:
    image = imread(path / file, **imread_kwargs)
    if rgb:
        image = moveaxis(image, -1, 1)[0]
    return Image2DModel.parse(
        image, transformations={"global": Identity()}, dims=("c", "y", "x"), **image_models_kwargs
    )

I didn't fully understand how the Image2DModel.parse() works and wanted to check first if there's maybe a more elegant way to include this instead of directly posting a pull request for my fork.

Thanks a lot in advance!
Johannes

hi @jwrth
thank you so much for reporting this and sorry for late reply

In line 139 it should be XeniumKeys.MORPHOLOGY_FOCUS_FILE instead of MIP

could you submit a PR for this?

And the np.testing.assert_array_equal() line here made also problem

a PR for this would be also very much appreciated

Regarding the image reading, couple of questions:

does the HnE and IHC comes from the merscope pipeline output or is it bespoke pipeline + additional measurements?
are these images in the same coordinate system as the merscope images/points, and if not, is it possible to compute a transformation natively that could be assigned to the appropriate coordinate system?

If the answer to the first question is no, then I'm afraid it'd be better to keep the IO for those images outside of the merscope reader, but I'd be happy to guide you on how to best do it. Can you share the shape/size of the images, format on disk and any other info relevant on how are they associated to the native merscope images?

Thank you!

Hi @giovp,

I submitted a PR for the first two bugs (#56).

Regarding your second questions. The H&E and IHC images come from the same tissue sections but from additional stainings and measurements. So not directly from the Xenium platform. I set up already a pipeline to register them to the original DAPI image. So they would be in the same coordinate system (same pixel dimensions as the DAPI image that comes out of the Xenium workflow).
As format I used ome.tiff to keep it consistent with the Xenium output files.

@jwrth thanks for reporting and for the PR. If all the images have the same pixels dimension and if it is something standard (for instance it also appears in other datasets from 10x) I would add a new parameter to the xenium() function to take into account for this option. Do you know any such public dataset we could test this on?

For the extra parameter we could use something like

xenium(..., additional_images: list[str] = [])`

WDYT @giovp?

Btw @jwrth the rgb approach is clean and the use of ImageModel.parse() is correct.

@LucaMarconato thanks for looking into it! No I don't know of any public dataset having additional images. For testing purposes I could share one of our datasets. But it's not published yet and shouldn't be included into tutorials, etc. :)

Thank you for the availability in sharing the dataset and sure I can keep it private. I won't have time to add this feature but if you submit a PR and share the dataset I will be happy to review it and merge it. To share data you can reach me via private message on the Zulip platform.

hi sorry for late reply. I would not add an additional_image argument to the xenium reader. I don't think we could make it general enough and the images could very well be read externally to the xenium function and added to spatialdata independently with the imagemodel parser (and the right transformation). Did you manage to do so @jwrth ? if so, I would close this issue as the other bug was solved in #56

Yes I agree it might be a bit complicated to make it really general. Didn't look into it yet but I will try to do it that way then. Thanks for your help and yes I think the issue can be closed.

	if morphology_focus:
	images["morphology_focus"] = _get_images(
	path,
	XeniumKeys.MORPHOLOGY_MIP_FILE,
	specs,
	imread_kwargs,
	image_models_kwargs,