XMLSchemaValidationError when reading xml/h5 dataset

Question

XMLSchemaValidationError when reading xml/h5 dataset

grinic opened this issue 2 years ago · 12 comments

grinic commented 2 years ago

System and Software

aicsimageio Version: 4.9.2
Python Version: Python 3.9.15
Operating System: Windows 10

Description

I am trying to open the xml/h5 dataset here:
xml
h5

using:

image = AICSImage(xmlfile, reader= BioformatsReader)
img = image.data
meta = image.metadata

and I obtain the following error:

---------------------------------------------------------------------------
XMLSchemaValidationError                  Traceback (most recent call last)
Cell In[2], line 1
----> 1 image.data

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\aics_image.py:529, in AICSImage.data(self)
    516 @property
    517 def data(self) -> np.ndarray:
    518     """
    519     Returns
    520     -------
   (...)
    527     Recommended to use `dask_data` for large mosaic images.
    528     """
--> 529     return self.xarray_data.data

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\aics_image.py:469, in AICSImage.xarray_data(self)
    453 """
    454 Returns
    455 -------
   (...)
    462 Recommended to use `xarray_dask_data` for large mosaic images.
    463 """
    464 if self._xarray_data is None:
    465     if (
    466         # Does the user want to get stitched mosaic
    467         self._reconstruct_mosaic
    468         # Does the data have a tile dim
--> 469         and dimensions.DimensionNames.MosaicTile in self.reader.dims.order
    470     ):
    471         try:
    472             self._xarray_data = (
    473                 self._transform_data_array_to_aics_image_standard(
    474                     self.reader.mosaic_xarray_data
    475                 )
    476             )

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\readers\reader.py:532, in Reader.dims(self)
    525 """
    526 Returns
    527 -------
    528 dims: Dimensions
    529     Object with the paired dimension names and their sizes.
    530 """
    531 if self._dims is None:
--> 532     self._dims = Dimensions(dims=self.xarray_dask_data.dims, shape=self.shape)
    534 return self._dims

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\readers\reader.py:359, in Reader.xarray_dask_data(self)
    352 """
    353 Returns
    354 -------
    355 xarray_dask_data: xr.DataArray
    356     The delayed image and metadata as an annotated data array.
    357 """
    358 if self._xarray_dask_data is None:
--> 359     self._xarray_dask_data = self._read_delayed()
    361 return self._xarray_dask_data

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\readers\bioformats_reader.py:164, in BioformatsReader._read_delayed(self)
    163 def _read_delayed(self) -> xr.DataArray:
--> 164     return self._to_xarray(delayed=True)

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\readers\bioformats_reader.py:202, in BioformatsReader._to_xarray(self, delayed)
    195 with BioFile(
    196     self._path,
    197     series=self.current_scene_index,
    198     **self._bf_kwargs,  # type: ignore
    199 ) as rdr:
    200     image_data = rdr.to_dask() if delayed else rdr.to_numpy()
    201     _, coords = metadata_utils.get_dims_and_coords_from_ome(
--> 202         ome=rdr.ome_metadata,
    203         scene_index=self.current_scene_index,
    204     )
    206 return xr.DataArray(
    207     image_data,
    208     dims=dimensions.DEFAULT_DIMENSION_ORDER_LIST_WITH_SAMPLES
   (...)
    215     },
    216 )

File ~\AppData\Roaming\Python\Python39\site-packages\aicsimageio\readers\bioformats_reader.py:447, in BioFile.ome_metadata(self)
    445 """Return OME object parsed by ome_types."""
    446 xml = metadata_utils.clean_ome_xml_for_known_issues(self.ome_xml)
--> 447 return OME.from_xml(xml)

File ~\AppData\Roaming\Python\Python39\site-packages\ome_types\model\ome.py:154, in OME.from_xml(cls, xml)
    150 @classmethod
    151 def from_xml(cls, xml: Union[Path, str]) -> "OME":
    152     from ome_types import from_xml
--> 154     return from_xml(xml)

File ~\AppData\Roaming\Python\Python39\site-packages\ome_types\_convenience.py:105, in from_xml(xml, parser, validate)
     79 def from_xml(
     80     xml: Union[Path, str, bytes],
     81     *,
     82     parser: Union[Parser, str, None] = None,
     83     validate: Optional[bool] = None,
     84 ) -> OME:
     85     """Generate OME metadata object from XML string or path.
     86 
     87     Parameters
   (...)
    103         ome_types.OME metadata object
    104     """
--> 105     d = to_dict(os.fspath(xml), parser=parser, validate=validate)
    106     return OME(**d)

File ~\AppData\Roaming\Python\Python39\site-packages\ome_types\_convenience.py:72, in to_dict(xml, parser, validate)
     69     else:
     70         raise KeyError("parser string must be one of {'lxml', 'xmlschema'}")
---> 72 d = parser(xml) if validate is None else parser(xml, validate=validate)
     73 for key in list(d.keys()):
     74     if key.startswith(("xml", "xsi")):

File ~\AppData\Roaming\Python\Python39\site-packages\ome_types\_xmlschema.py:231, in xmlschema2dict(xml, schema, converter, validate, **kwargs)
    229 if validate:
    230     schema = schema or get_schema(xml)
--> 231 result = xmlschema.to_dict(xml, schema=schema, converter=converter, **kwargs)
    232 # xmlschema doesn't provide usable access to mixed XML content, so we'll
    233 # fill the XMLAnnotation value attributes ourselves by re-parsing the XML
    234 # with ElementTree and using the Element objects as the values.
    235 tree = None

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\documents.py:289, in to_dict(xml_document, schema, cls, path, validation, process_namespaces, locations, base_url, defuse, timeout, lazy, **kwargs)
    277 """
    278 Decodes an XML document to a Python's nested dictionary. Takes the same arguments
    279 of the function :meth:`iter_decode`, but *validation* mode defaults to 'strict'.
   (...)
    284 ``validation='strict'`` is provided.
    285 """
    286 source, _schema = get_context(
    287     xml_document, schema, cls, locations, base_url, defuse, timeout, lazy
    288 )
--> 289 return _schema.decode(source, path=path, validation=validation,
    290                       process_namespaces=process_namespaces, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\schemas.py:2015, in XMLSchemaBase.decode(self, source, path, schema_path, validation, *args, **kwargs)
   2011 """
   2012 Decodes XML data. Takes the same arguments of the method :meth:`iter_decode`.
   2013 """
   2014 data, errors = [], []
-> 2015 for result in self.iter_decode(source, path, schema_path, validation, *args, **kwargs):
   2016     if not isinstance(result, XMLSchemaValidationError):
   2017         data.append(result)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\schemas.py:2001, in XMLSchemaBase.iter_decode(self, source, path, schema_path, validation, process_namespaces, namespaces, use_defaults, decimal_type, datetime_types, binary_types, converter, filler, fill_missing, keep_unknown, process_skipped, max_depth, depth_filler, value_hook, **kwargs)
   1998             yield schema.validation_error(validation, reason, elem, resource, namespaces)
   1999             return
-> 2001     yield from xsd_element.iter_decode(elem, validation, **kwargs)
   2003 if 'max_depth' not in kwargs:
   2004     yield from self._validate_references(validation=validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:733, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    730         for error in assertion(obj, **kwargs):
    731             yield self.validation_error(validation, error, **kwargs)
--> 733 for result in content_decoder.iter_decode(obj, validation, **kwargs):
    734     if isinstance(result, XMLSchemaValidationError):
    735         yield self.validation_error(validation, result, obj, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\groups.py:1073, in XsdGroup.iter_decode(self, obj, validation, **kwargs)
   1070         result_list.append((child.tag, func(xsd_element), xsd_element))
   1071     continue
-> 1073 for result in xsd_element.iter_decode(child, validation, **kwargs):
   1074     if isinstance(result, XMLSchemaValidationError):
   1075         yield result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:733, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    730         for error in assertion(obj, **kwargs):
    731             yield self.validation_error(validation, error, **kwargs)
--> 733 for result in content_decoder.iter_decode(obj, validation, **kwargs):
    734     if isinstance(result, XMLSchemaValidationError):
    735         yield self.validation_error(validation, result, obj, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\groups.py:1073, in XsdGroup.iter_decode(self, obj, validation, **kwargs)
   1070         result_list.append((child.tag, func(xsd_element), xsd_element))
   1071     continue
-> 1073 for result in xsd_element.iter_decode(child, validation, **kwargs):
   1074     if isinstance(result, XMLSchemaValidationError):
   1075         yield result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:689, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    687 for result in attribute_group.iter_decode(obj.attrib, validation, **kwargs):
    688     if isinstance(result, XMLSchemaValidationError):
--> 689         yield self.validation_error(validation, result, obj, **kwargs)
    690     else:
    691         attributes = result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\xsdbase.py:227, in XsdValidator.validation_error(self, validation, error, obj, source, namespaces, **_kwargs)
    224     error = XMLSchemaValidationError(self, obj, error, source, namespaces)
    226 if validation == 'strict' and error.elem is not None:
--> 227     raise error
    228 return error

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\simple_types.py:1388, in XsdAtomicRestriction.iter_decode(self, obj, validation, **kwargs)
   1386 if not isinstance(self.primitive_type, XsdUnion):
   1387     try:
-> 1388         self.patterns(obj)
   1389     except XMLSchemaValidationError as err:
   1390         yield err

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\facets.py:726, in XsdPatternFacets.__call__(self, text)
    724     if all(pattern.match(text) is None for pattern in self.patterns):
    725         reason = _("value doesn't match any pattern of {!r}").format(self.regexps)
--> 726         raise XMLSchemaValidationError(self, text, reason)
    727 except TypeError as err:
    728     raise XMLSchemaValidationError(self, text, str(err)) from None

XMLSchemaValidationError: failed validating 'XMLAnnotation:0' with XsdPatternFacets(['(urn:lsid:([\\w\\-\\.]+\\.[\\w\\-\\.]+)+:Annotation:\\S+)|(Annotatio...']):

Reason: attribute ID='XMLAnnotation:0': value doesn't match any pattern of ['(urn:lsid:([\\w\\-\\.]+\\.[\\w\\-\\.]+)+:Annotation:\\S+)|(Annotation:\\S+)']

Schema:

  <xsd:pattern xmlns:xsd="http://www.w3.org/2001/XMLSchema" value="(urn:lsid:([\w\-\.]+\.[\w\-\.]+)+:Annotation:\S+)|(Annotation:\S+)" />

Instance:

  <XMLAnnotation xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" ID="XMLAnnotation:0"><Value><SpimData version="0.2">
    <BasePath type="relative">.</BasePath>
    <SequenceDescription>
      <ImageLoader format="bdv.hdf5">
        <hdf5 type="relative">xyzct_16bit__mitosis.h5</hdf5>
      </ImageLoader>
      <ViewSetups>
        <ViewSetup>
          <id>0</id>
          <name>channel 1</name>
          <size>171 196 5</size>
          <voxelSize>
            <unit>µm</unit>
            <size>0.08850000022125 0.08850000022125 1.0</size>
          </voxelSize>
          <attributes>
            <channel>1</channel>
          </attributes>
        </ViewSetup>
        <ViewSetup>
    ...
    ...
  </SpimData></Value></XMLAnnotation>

Path: /OME/StructuredAnnotations/XMLAnnotation

The same error appears when I try to acces image.data and image.metadata.

Expected Behavior

I am not sure if the problem is on the dataset side (XMLAnnotation error is raise), however I am able to open the dataset using python-bioformats directly and with the FIJI Bioformats importer.

I am aware of this previous issue, however I am not sure what should I do to solve my problem. I think I already have the latest version of aicsimageio?

Thank you so much for your help!

Reproduction

After downloading the xml/h5 dataset above, the error can be reproduced with:

from aicsimageio import AICSImage
from aicsimageio.readers import BioformatsReader

xmlfile = 'path-to-xml-file'
image = AICSImage(xmlfile, reader= BioformatsReader)
img = image.data

Answer 1 · 2022-12-13T18:07:01.000Z

Sorry to hear you are having an issue! Hopefully we can come up with a fix.

Pinging @tlambert03.

From my view this is a problem with the metadata of the dataset itself, but we have made changes to our metadata processing to allow for "common metadata problems" -- exactly like the mircomanager PR you linked. Do you know how this dataset was created?

Answer 2 · 2022-12-14T10:42:16.000Z

Thanks @evamaxfield for the reply.
The dataset was created using FIJI: Plugins › BigDataViewer › Export Current Image as XML/HDF5.
I don't know the file format of the original dataset.

Some additional information that might be useful. I tried:

open the xml/h5 dataset with FIJI > Plugins > Bio-Formats > Bio-Formats Importer
save image as tif (it then retains all metadata)
open the tif image and re-save it using Plugins › BigDataViewer › Export Current Image as XML/HDF5

At this point, using

image = AICSImage(<tif-file>, reader=BioformatsReader)
print(image.data.shape)
print(image.physical_pixel_sizes)

works as expected. However:

image = AICSImage(<new-xml-h5-file>, reader=BioformatsReader)
print(image.data.shape)
print(image.physical_pixel_sizes)

generates the same XMLSchemaValidationError error as before.
I guess this suggests that the problem lies in the way AICSImage reads the output of BigDataViewer xml/h5 export?

Answer 3 · 2022-12-15T13:04:45.000Z

Yeah, looks like the issue here is that BigDataViewer is outputting xml that somehow doesn't validate against the OME schema. This is more of a problem with ome-types than aicsimageio. We've discussed having more graceful validation error handling there. I think I can probably get this one loading. Give me a few days and I'll try to release an update to ome-types that works here.

Answer 4 · 2022-12-15T13:31:11.000Z

the core problem here is that, somehow/somewhere, the XMLAnnotation is getting written with an ID of <XMLAnnotation ID="XMLAnnotation:0"> .... which does not validate against the AnnotationID regex of ((urn:lsid:([\w\-\.]+\.[\w\-\.]+)+:\S+:\S+)|(\S+:\S+)) & ((urn:lsid:([\w\-\.]+\.[\w\-\.]+)+:Annotation:\S+)|(Annotation:\S+)) (it appears that it should be Annotation:0)

I haven't been able to figure out exactly where in the process this is happening yet (i.e. whether it's in the big dataviewer exporter, or the java bioformats reader that merges the xml file with the hdf5 file). But at least it's clear what the problem is, and we could certainly special case this in clean_ome_xml_for_known_issues

Answer 5 · 2022-12-15T13:42:53.000Z

@ctrueden ... if I may bother/summon you :) Maybe you can help me figure this out?

Quick summary: When I load the BigDataViewer h5 file linked in the first post in Fiji with the Bioformats importer, i see this metadata:

note the id of the XMLAnnotation <XMLAnnotation ID="XMLAnnotation:0">, which doesn't validate against the AnnotationID regex in the OME model. The method dumping the xml is loci.formats.ome.OMEPyramidStore.dumpXML, but I haven't been able to determine exactly where in the inheritance chain the faulty id is getting written?

side-note, I will of course fix stuff on my end to fail more gracefully with minor problems like that... just curious if you had run into that ID before

Answer 6 · 2022-12-15T13:58:23.000Z

patch for this special case in #455

Answer 7 · 2022-12-15T16:08:27.000Z

Thanks a lot for the quick help!

Now I get this new error (only pasting the last XMLSchemaValidationError). Seems like the <XMLAnnotation ID> is completely missing now:

XMLSchemaValidationError: failed validating {} with XsdAttributeGroup(['ID', 'Name', 'SamplesPerPixel', 'IlluminationType', 'PinholeSize', 'PinholeSizeUnit', 'AcquisitionMode', 'ContrastMethod', 'ExcitationWavelength', 'ExcitationWavelengthUnit', 'EmissionWavelength', 'EmissionWavelengthUnit', 'Fluor', 'NDFilter', 'PockelCellSetting', 'Color']):

Reason: missing required attribute 'ID'

Schema:

  <xsd:complexType xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:sequence>
      <xsd:element ref="LightSourceSettings" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="DetectorSettings" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="FilterSetRef" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="AnnotationRef" minOccurs="0" maxOccurs="unbounded">
        <xsd:annotation>
          <xsd:appinfo>
            <xsdfu>
              <manytomany />
            </xsdfu>
          </xsd:appinfo>
        </xsd:annotation>
      </xsd:element>
      <xsd:element ref="LightPath" minOccurs="0" maxOccurs="1" />
    </xsd:sequence>
    <xsd:attribute name="ID" use="required" type="ChannelID" />
    <xsd:attribute name="Name" use="optional" type="xsd:string">
      <xsd:annotation>
        <xsd:documentation>
    ...
    ...
  </xsd:complexType>

Instance:

  <Channel xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06">
    <id>1</id>
    <name>1</name>
  </Channel>

Path: /OME/StructuredAnnotations/XMLAnnotation/Value/SpimData/SequenceDescription/ViewSetups/Attributes/Channel[1]

Maybe I am doing something wrong? I installed following patch #455 using the fix-xmlannotation-id branch via:
pip install git+https://github.com/tlambert03/aicsimageio.git@fix-xmlannotation-id

Answer 8 · 2022-12-15T16:16:32.000Z

can you show me the exact code that precedes the error?

Answer 9 · 2022-12-15T16:19:32.000Z

Sure, here it is:

---------------------------------------------------------------------------
XMLSchemaValidationError                  Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(image2.data.shape)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\aics_image.py:529, in AICSImage.data(self)
    516 @property
    517 def data(self) -> np.ndarray:
    518     """
    519     Returns
    520     -------
   (...)
    527     Recommended to use `dask_data` for large mosaic images.
    528     """
--> 529     return self.xarray_data.data

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\aics_image.py:469, in AICSImage.xarray_data(self)
    453 """
    454 Returns
    455 -------
   (...)
    462 Recommended to use `xarray_dask_data` for large mosaic images.
    463 """
    464 if self._xarray_data is None:
    465     if (
    466         # Does the user want to get stitched mosaic
    467         self._reconstruct_mosaic
    468         # Does the data have a tile dim
--> 469         and dimensions.DimensionNames.MosaicTile in self.reader.dims.order
    470     ):
    471         try:
    472             self._xarray_data = (
    473                 self._transform_data_array_to_aics_image_standard(
    474                     self.reader.mosaic_xarray_data
    475                 )
    476             )

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\readers\reader.py:532, in Reader.dims(self)
    525 """
    526 Returns
    527 -------
    528 dims: Dimensions
    529     Object with the paired dimension names and their sizes.
    530 """
    531 if self._dims is None:
--> 532     self._dims = Dimensions(dims=self.xarray_dask_data.dims, shape=self.shape)
    534 return self._dims

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\readers\reader.py:359, in Reader.xarray_dask_data(self)
    352 """
    353 Returns
    354 -------
    355 xarray_dask_data: xr.DataArray
    356     The delayed image and metadata as an annotated data array.
    357 """
    358 if self._xarray_dask_data is None:
--> 359     self._xarray_dask_data = self._read_delayed()
    361 return self._xarray_dask_data

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\readers\bioformats_reader.py:164, in BioformatsReader._read_delayed(self)
    163 def _read_delayed(self) -> xr.DataArray:
--> 164     return self._to_xarray(delayed=True)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\readers\bioformats_reader.py:202, in BioformatsReader._to_xarray(self, delayed)
    195 with BioFile(
    196     self._path,
    197     series=self.current_scene_index,
    198     **self._bf_kwargs,  # type: ignore
    199 ) as rdr:
    200     image_data = rdr.to_dask() if delayed else rdr.to_numpy()
    201     _, coords = metadata_utils.get_dims_and_coords_from_ome(
--> 202         ome=rdr.ome_metadata,
    203         scene_index=self.current_scene_index,
    204     )
    206 return xr.DataArray(
    207     image_data,
    208     dims=dimensions.DEFAULT_DIMENSION_ORDER_LIST_WITH_SAMPLES
   (...)
    215     },
    216 )

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\aicsimageio\readers\bioformats_reader.py:447, in BioFile.ome_metadata(self)
    445 """Return OME object parsed by ome_types."""
    446 xml = metadata_utils.clean_ome_xml_for_known_issues(self.ome_xml)
--> 447 return OME.from_xml(xml)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\ome_types\model\ome.py:154, in OME.from_xml(cls, xml)
    150 @classmethod
    151 def from_xml(cls, xml: Union[Path, str]) -> "OME":
    152     from ome_types import from_xml
--> 154     return from_xml(xml)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\ome_types\_convenience.py:105, in from_xml(xml, parser, validate)
     79 def from_xml(
     80     xml: Union[Path, str, bytes],
     81     *,
     82     parser: Union[Parser, str, None] = None,
     83     validate: Optional[bool] = None,
     84 ) -> OME:
     85     """Generate OME metadata object from XML string or path.
     86 
     87     Parameters
   (...)
    103         ome_types.OME metadata object
    104     """
--> 105     d = to_dict(os.fspath(xml), parser=parser, validate=validate)
    106     return OME(**d)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\ome_types\_convenience.py:72, in to_dict(xml, parser, validate)
     69     else:
     70         raise KeyError("parser string must be one of {'lxml', 'xmlschema'}")
---> 72 d = parser(xml) if validate is None else parser(xml, validate=validate)
     73 for key in list(d.keys()):
     74     if key.startswith(("xml", "xsi")):

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\ome_types\_xmlschema.py:238, in xmlschema2dict(xml, schema, converter, validate, **kwargs)
    235 if _XMLSCHEMA_VERSION >= (2,):
    236     kwargs["validation"] = "strict" if validate else "lax"
--> 238 result = xmlschema.to_dict(xml, schema=schema, converter=converter, **kwargs)
    240 if _XMLSCHEMA_VERSION >= (2,) and not validate:
    241     result, _ = result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\documents.py:289, in to_dict(xml_document, schema, cls, path, validation, process_namespaces, locations, base_url, defuse, timeout, lazy, **kwargs)
    277 """
    278 Decodes an XML document to a Python's nested dictionary. Takes the same arguments
    279 of the function :meth:`iter_decode`, but *validation* mode defaults to 'strict'.
   (...)
    284 ``validation='strict'`` is provided.
    285 """
    286 source, _schema = get_context(
    287     xml_document, schema, cls, locations, base_url, defuse, timeout, lazy
    288 )
--> 289 return _schema.decode(source, path=path, validation=validation,
    290                       process_namespaces=process_namespaces, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\schemas.py:2015, in XMLSchemaBase.decode(self, source, path, schema_path, validation, *args, **kwargs)
   2011 """
   2012 Decodes XML data. Takes the same arguments of the method :meth:`iter_decode`.
   2013 """
   2014 data, errors = [], []
-> 2015 for result in self.iter_decode(source, path, schema_path, validation, *args, **kwargs):
   2016     if not isinstance(result, XMLSchemaValidationError):
   2017         data.append(result)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\schemas.py:2001, in XMLSchemaBase.iter_decode(self, source, path, schema_path, validation, process_namespaces, namespaces, use_defaults, decimal_type, datetime_types, binary_types, converter, filler, fill_missing, keep_unknown, process_skipped, max_depth, depth_filler, value_hook, **kwargs)
   1998             yield schema.validation_error(validation, reason, elem, resource, namespaces)
   1999             return
-> 2001     yield from xsd_element.iter_decode(elem, validation, **kwargs)
   2003 if 'max_depth' not in kwargs:
   2004     yield from self._validate_references(validation=validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:733, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    730         for error in assertion(obj, **kwargs):
    731             yield self.validation_error(validation, error, **kwargs)
--> 733 for result in content_decoder.iter_decode(obj, validation, **kwargs):
    734     if isinstance(result, XMLSchemaValidationError):
    735         yield self.validation_error(validation, result, obj, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\groups.py:1073, in XsdGroup.iter_decode(self, obj, validation, **kwargs)
   1070         result_list.append((child.tag, func(xsd_element), xsd_element))
   1071     continue
-> 1073 for result in xsd_element.iter_decode(child, validation, **kwargs):
   1074     if isinstance(result, XMLSchemaValidationError):
   1075         yield result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:733, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    730         for error in assertion(obj, **kwargs):
    731             yield self.validation_error(validation, error, **kwargs)
--> 733 for result in content_decoder.iter_decode(obj, validation, **kwargs):
    734     if isinstance(result, XMLSchemaValidationError):
    735         yield self.validation_error(validation, result, obj, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\groups.py:1073, in XsdGroup.iter_decode(self, obj, validation, **kwargs)
   1070         result_list.append((child.tag, func(xsd_element), xsd_element))
   1071     continue
-> 1073 for result in xsd_element.iter_decode(child, validation, **kwargs):
   1074     if isinstance(result, XMLSchemaValidationError):
   1075         yield result

    [... skipping similar frames: XsdElement.iter_decode at line 733 (2 times), XsdGroup.iter_decode at line 1073 (2 times)]

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\wildcards.py:507, in XsdAnyElement.iter_decode(self, obj, validation, **kwargs)
    505 if validation != 'skip' and self.process_contents == 'strict':
    506     yield self.validation_error(validation, reason, obj, **kwargs)
--> 507 yield from self.any_type.iter_decode(obj, validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\complex_types.py:740, in XsdComplexType.iter_decode(self, obj, validation, **kwargs)
    738     xsd_element = self.schema.create_element(obj.tag, parent=self, form='unqualified')
    739     xsd_element.type = self
--> 740     yield from xsd_element.iter_decode(obj, validation, **kwargs)
    741 elif isinstance(self.content, XsdSimpleType):
    742     yield from self.content.iter_decode(obj, validation, **kwargs)

    [... skipping similar frames: XsdElement.iter_decode at line 733 (1 times), XsdGroup.iter_decode at line 1073 (1 times)]

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\wildcards.py:507, in XsdAnyElement.iter_decode(self, obj, validation, **kwargs)
    505 if validation != 'skip' and self.process_contents == 'strict':
    506     yield self.validation_error(validation, reason, obj, **kwargs)
--> 507 yield from self.any_type.iter_decode(obj, validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\complex_types.py:740, in XsdComplexType.iter_decode(self, obj, validation, **kwargs)
    738     xsd_element = self.schema.create_element(obj.tag, parent=self, form='unqualified')
    739     xsd_element.type = self
--> 740     yield from xsd_element.iter_decode(obj, validation, **kwargs)
    741 elif isinstance(self.content, XsdSimpleType):
    742     yield from self.content.iter_decode(obj, validation, **kwargs)

    [... skipping similar frames: XsdElement.iter_decode at line 733 (2 times), XsdGroup.iter_decode at line 1073 (2 times), XsdAnyElement.iter_decode at line 507 (1 times), XsdComplexType.iter_decode at line 740 (1 times)]

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\wildcards.py:507, in XsdAnyElement.iter_decode(self, obj, validation, **kwargs)
    505 if validation != 'skip' and self.process_contents == 'strict':
    506     yield self.validation_error(validation, reason, obj, **kwargs)
--> 507 yield from self.any_type.iter_decode(obj, validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\complex_types.py:740, in XsdComplexType.iter_decode(self, obj, validation, **kwargs)
    738     xsd_element = self.schema.create_element(obj.tag, parent=self, form='unqualified')
    739     xsd_element.type = self
--> 740     yield from xsd_element.iter_decode(obj, validation, **kwargs)
    741 elif isinstance(self.content, XsdSimpleType):
    742     yield from self.content.iter_decode(obj, validation, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:733, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    730         for error in assertion(obj, **kwargs):
    731             yield self.validation_error(validation, error, **kwargs)
--> 733 for result in content_decoder.iter_decode(obj, validation, **kwargs):
    734     if isinstance(result, XMLSchemaValidationError):
    735         yield self.validation_error(validation, result, obj, **kwargs)

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\groups.py:1073, in XsdGroup.iter_decode(self, obj, validation, **kwargs)
   1070         result_list.append((child.tag, func(xsd_element), xsd_element))
   1071     continue
-> 1073 for result in xsd_element.iter_decode(child, validation, **kwargs):
   1074     if isinstance(result, XMLSchemaValidationError):
   1075         yield result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\wildcards.py:489, in XsdAnyElement.iter_decode(self, obj, validation, **kwargs)
    487         reason = f"element {obj.tag!r} not found"
    488     else:
--> 489         yield from xsd_element.iter_decode(obj, validation, **kwargs)
    490         return
    492 if XSI_TYPE in obj.attrib:

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\elements.py:689, in XsdElement.iter_decode(self, obj, validation, **kwargs)
    687 for result in attribute_group.iter_decode(obj.attrib, validation, **kwargs):
    688     if isinstance(result, XMLSchemaValidationError):
--> 689         yield self.validation_error(validation, result, obj, **kwargs)
    690     else:
    691         attributes = result

File ~\Anaconda3\envs\sk-nap-tut\lib\site-packages\xmlschema\validators\xsdbase.py:227, in XsdValidator.validation_error(self, validation, error, obj, source, namespaces, **_kwargs)
    224     error = XMLSchemaValidationError(self, obj, error, source, namespaces)
    226 if validation == 'strict' and error.elem is not None:
--> 227     raise error
    228 return error

XMLSchemaValidationError: failed validating {} with XsdAttributeGroup(['ID', 'Name', 'SamplesPerPixel', 'IlluminationType', 'PinholeSize', 'PinholeSizeUnit', 'AcquisitionMode', 'ContrastMethod', 'ExcitationWavelength', 'ExcitationWavelengthUnit', 'EmissionWavelength', 'EmissionWavelengthUnit', 'Fluor', 'NDFilter', 'PockelCellSetting', 'Color']):

Reason: missing required attribute 'ID'

Schema:

  <xsd:complexType xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:sequence>
      <xsd:element ref="LightSourceSettings" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="DetectorSettings" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="FilterSetRef" minOccurs="0" maxOccurs="1" />
      <xsd:element ref="AnnotationRef" minOccurs="0" maxOccurs="unbounded">
        <xsd:annotation>
          <xsd:appinfo>
            <xsdfu>
              <manytomany />
            </xsdfu>
          </xsd:appinfo>
        </xsd:annotation>
      </xsd:element>
      <xsd:element ref="LightPath" minOccurs="0" maxOccurs="1" />
    </xsd:sequence>
    <xsd:attribute name="ID" use="required" type="ChannelID" />
    <xsd:attribute name="Name" use="optional" type="xsd:string">
      <xsd:annotation>
        <xsd:documentation>
    ...
    ...
  </xsd:complexType>

Instance:

  <Channel xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06">
    <id>1</id>
    <name>1</name>
  </Channel>

Path: /OME/StructuredAnnotations/XMLAnnotation/Value/SpimData/SequenceDescription/ViewSetups/Attributes/Channel[1]

Answer 10 · 2022-12-15T16:21:55.000Z

sorry... I'm looking more for what image2 was. Like, which file did you open, (xml or h5), how did you open (bioformats reader vs aicsimage), etc... also can you give me the output of:

from aicsimageio.readers import BioformatsReader
BioformatsReader.bioformats_version()

Answer 11 · 2022-12-15T16:27:20.000Z

Ah, sorry I didn't get it.
I am opening the xml file that I posted in the first post, both xml and h5 files are in the same folder, with AICSImage:

from aicsimageio import AICSImage
from aicsimageio.readers import BioformatsReader
fname1 = 'test_dataset\\xyzct_16bit__mitosis.tif'
image1 = AICSImage(fname1, reader=BioformatsReader)
fname2 = 'test_dataset\\xyzct_16bit__mitosis.xml'
image2 = AICSImage(fname2, reader=BioformatsReader)

In this case, the tif file was generated opening the xml/h5 dataset with Bioformats Importer in FIJI and saving the dataset as tif.
print(image1.metadata) gives the expected result, while print(image2.metadata) gives the above error.
The output of the code:

from aicsimageio.readers import BioformatsReader
BioformatsReader.bioformats_version()

is:
'6.12.0-SNAPSHOT'

Answer 12 · 2022-12-15T16:34:17.000Z

huh... bummer, I can't reproduce. will need to try windows and python 3.9