NeurodataWithoutBorders/api-python

Issue with extension links to extension datasets

jonc125 opened this issue · 3 comments

I have tried adding a pixel_time_offsets dataset in TwoPhotonSeries linking to another extension dataset, and get the following validation error when I finish creating the NWB file:

78 datasets defined in extension, but missing attribute schema_id (2 combined):
  1. '/acquisition/timeseries/ROI_#_Green/pixel_time_offsets (#=1-39)'
  2. '/acquisition/timeseries/ROI_#_Red/pixel_time_offsets (#=1-39)'

The relevant portion of the extension definition is:

"<TwoPhotonSeries>/": {
        "description": "Extension to add a pixel_time_offsets dataset.",
        "pixel_time_offsets?": {
            "description": ("The offset from the frame timestamp at which each pixel was acquired."
                            " Note that the offset is not time-varying, i.e. it is the same for"
                            " each frame. These offsets are given in the same units as for the"
                            " timestamps array, i.e. seconds."),
            "link": {"target_type": "pixel_time_offsets", "allow_subclasses": False},
            "data_type": "float64!"
        }
    },
    "<roi_name>/*": {
        "pixel_time_offsets": {
            "description": ("The offset from the frame timestamp at which each pixel in this ROI"
                            " was acquired."
                            " Note that the offset is not time-varying, i.e. it is the same for"
                            " each frame. These offsets are given in the same units as for the"
                            " timestamps array, i.e. seconds."),
            "data_type": "float64!",
            "dimensions": [["y"], ["y", "x"]]
        }
    }

I can work around the problem (i.e. stop the validation error happening) by adding the second line below once I have called set_dataset, so it looks like the API is failing to fill in the h5attrs dict when it should be in this instance. The schema_id attribute does get created in the NWB file and can be seen in HdfView, even without the extra line.

ts.set_dataset('pixel_time_offsets', 'link:'+roi['pixel_time_offsets'].name)
ts.get_node('pixel_time_offsets').h5attrs['schema_id'] = 'pixeltimes:pixel_time_offsets'

Thanks for pointing out this problem. I think it was due to the validate_nodes function checking if pixel_time_offsets? (the link, not the dataset) has a schema_id attribute, whereas links in HDF5 files do not have attributes; only the target of the links have them. So validate_nodes should not check links for the schema_id or neurodata_type or any other attribute. The h5attrs dict keeps track of which attributes are actually stored in the HDF5 file. Your direct assignment to this (extra line above) suppressed this incorrect error detection in function validate_nodes. I think the reason you saw the attribute in HDFView is that both the link and target of the link are the same (a dataset) and you were seeing the schema_id attribute in the dataset. When investigating this I also discovered that the API was allowing attributes to be set on h5gate Node objects that correspond to links even though, as mentioned, links in HDF5 do not have attributes. These problems will be hopefully fixed in the next update.

This should now be fixed. (Commit b388f5a).

I can confirm this is fixed in my instances.