ingest.xslt produces invalid ICAT data files
Closed this issue · 1 comments
The ICAT data produced by the reference ingest.xslt
file added in #123 and used internally in icat.ingest
fails to validate against the icatdata-*.xsd
XML schema files.
A typical output after transformation (and reformatting for readability) may look like:
<?xml version="1.0"?>
<icatdata>
<data>
<dataset id="Dataset_1">
<complete>false</complete>
<description>Dy01Cp02 at 2.7 K</description>
<endDate>2022-02-03T17:04:22+01:00</endDate>
<name>testingest_inl_1</name>
<startDate>2022-02-03T15:40:12+01:00</startDate>
<investigation ref="_Investigation"/>
<parameters>
<stringValue>neutron</stringValue>
<type name="Probe"/>
</parameters>
<parameters>
<numericValue>5.3</numericValue>
<type name="Reactor power" units="MW"/>
</parameters>
<parameters>
<numericValue>2.74103</numericValue>
<rangeBottom>2.7408</rangeBottom>
<rangeTop>2.7414</rangeTop>
<type name="Sample temperature" units="K"/>
</parameters>
<parameters>
<numericValue>4.1357</numericValue>
<rangeBottom>4.0573</rangeBottom>
<rangeTop>4.1567</rangeTop>
<type name="Magnetic field" units="T"/>
</parameters>
<parameters>
<stringValue>Dy01Cp02</stringValue>
<type name="Comment"/>
</parameters>
<type name="raw"/>
</dataset>
</data>
</icatdata>
Trying to validate that against icatdata-4.4.xsd
yields the following errors:
$ xmllint --noout --schema doc/icatdata-4.4.xsd -
-:35: element type: Schemas validity error : Element 'type': This element is not expected. Expected is ( parameters ).
- fails to validate
The error is caused by the order of the elements: the XSD imposes a particular order where all many to one relations (e.g. type
) need to come before any one to many relations (e.g. parameters
).
Note that this issue may be somewhat nitpicking because class icat.dumpfile_xml.XMLDumpFileReader
that consumes that input does not care about the order and that is why the ingest succeeds nevertheless. But still, the XSLT provided with python-icat should generate valid data according to python-icat's own schema.
It turns out, it is even worse than that: also the order imposed by icatdata-5.0.xsd
and ingest-10.xsd
respectively is inconsistent. icatdata-5.0.xsd
imposes as subelements of data
: …, dataset
, datasetTechnique
, datasetInstrument
, datasetParameter
, …, while ingest-10.xsd
imposes: dataset
, datasetInstrument
, datasetTechnique
, datasetParameter
. E.g. the order of datasetTechnique
and datasetInstrument
is inverted. ingest.xslt
keeps that order from the input on transformation, so the result is invalid here as well.
We have basically two bad options to fix this:
- fix it on the input, e.g. fix
ingest-10.xsd
. This is bad because it has an impact on the input accepted by theicat.ingest
module and retroactively changes a released file format version. E.g. input files that were valid ingest files version 1.0 according to python-icat 1.1.0 will be invalid in python-icat 1.2.0. - fix it in the transformation, e.g. change the order generated by
ingest.xslt
. This will makeingest.xslt
needlessly complicated, only to keep compatibility with an inconsistent past.
Given the fact that the whole icat.ingest
feature was declared experimental in the python-icat 1.1.0 release and I believe it doesn't have much users by now, I tend to go for the breaking first option.