prjemian/punx

validate: mismatch between NXdata@axes and available fields

Closed this issue · 3 comments

BTW, I'm surprised that punx validate did not identify the very real problem with this NeXus file:

    data:NXdata
      @NX_class = "NXdata"
      @axes = ["und"]
      @signal = "XPS_sample"
      @target = "/entry/data"
      EPOCH --> /entry/instrument/bluesky/streams/primary/und_readback/time
      M6_collector --> /entry/instrument/bluesky/streams/primary/M6_collector/value
      XPS_pd --> /entry/instrument/bluesky/streams/primary/XPS_pd/value
      XPS_sample --> /entry/instrument/bluesky/streams/primary/XPS_sample/value
      und_readback --> /entry/instrument/bluesky/streams/primary/und_readback/value

In NXdata, the axes attribute value names a field that exists in the group (either as HDF5 dataset or HDF5 link). And that is clearly not true here.

Originally posted by @prjemian in BCDA-APS/apstools#806 (comment)

This test method:

def test_i219(tempdir):
    h5file = tempdir / "test_file.h5"
    assert not h5file.exists()

    with h5py.File(h5file, "w") as root:
        root.attrs["default"] = "entry"

        nxentry = root.create_group(root.attrs["default"])
        nxentry.attrs["NX_class"] = "NXentry"
        nxentry.attrs["default"] = "data"

        nxdata = nxentry.create_group(nxentry.attrs["default"])
        nxdata.attrs["NX_class"] = "NXdata"

        # these match
        nxdata.attrs["signal"] = "XPS_sample"
        nxdata.create_dataset("XPS_sample", data=[1, 2, 3])
        assert nxdata.attrs["signal"] in nxdata

        # these do not match
        nxdata.attrs["axes"] = ["und"]
        nxdata.create_dataset("und_readback", data=[3, 4, 1])
        for k in nxdata.attrs["axes"]:
            assert k not in nxdata

    assert h5file.exists()

    validator = validate.Data_File_Validator()
    assert isinstance(validator, validate.Data_File_Validator)

    validator.validate(h5file)

    average = validator.finding_score()[-1]
    assert average < -10_000

confirms that validate fails to catch this mismatch:

        average = validator.finding_score()[-1]
>       assert average < -10_000
E       assert 98.84415584415585 < -10000

punx/tests/test_i219_nxdata_mismatch.py:60: AssertionError
=========================== short test summary info ============================
FAILED punx/tests/test_i219_nxdata_mismatch.py::test_i219 - assert 98.8441558...
============================== 1 failed in 1.11s ===============================

With any single test failure, average should be a negative number.

Principal reason is described by this finding from validate:

/entry/data@axes                                                              TODO     attribute value                      implement                                                                            

Here:

def axes_handler(validator, v_item):
"""
validate @axes
"""
# TODO: axes_attr = v_item.h5_object
# if this is not an array, make it axes_attr_array
# TODO: need to know shape of signal data
# TODO: compare len(axes_attr_array) with range of signal data
# TODO: check each value of array that is a validItemName and points to actual local field
generic_handler(validator, v_item)