AllenInstitute/ipfx

IVSCC pipeline sweep extraction crashing with TypeError at get_nwb_version()

ru57y34nn opened this issue · 2 comments

Describe the bug
All IVSCC experiments have recently started failing sweep extraction with the same error. The following is the last few lines of the traceback from a EPHYS_NWB_STIMULUS_SUMMARY_V3_QUEUE log file:

File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/dataset/create.py", line 72, in get_nwb_version
re.match("^2", nwb_version) or
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/re.py", line 172, in match
return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object

The type error is being caused by the nwb_version attribute that get_nwb_version() gets from the nwb file to determine the version. The nwb_version attribute is expected to be a string but in recent nwb files the nwb_version attribute is a byte string which is causing re.match() to throw a TypeError as it is comparing a string with a byte object.

This issue seems to have arisen since the IVSCC pipeline upgraded from Igor Pro 8 to Igor Pro 9. This was an unexpected consequence of the upgrade.

To Reproduce
See EPHYS_NWB_STIMULUS_SUMMARY_V3_QUEUE log files for all recent IVSCC experiments since around 11/10/2021.

Expected behavior
the function get_nwb_version() should first check if nwb_version is an instance of the byte class and decode it to a string if necessary before using re.match() for comparison.

Actual Behavior
What actually happened. If the bug produced an error message or incorrect values, please include them here!

Full traceback from recent EPHYS_NWB_STIMULUS_SUMMARY_V3_QUEUE log file.
Traceback (most recent call last):
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/bin/run_sweep_extraction.py", line 96, in
main()
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/bin/run_sweep_extraction.py", line 89, in main
module.args.get("stimulus_ontology_file", None)
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/bin/run_sweep_extraction.py", line 61, in run_sweep_extraction
ontology=ont
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/dataset/create.py", line 100, in create_ephys_data_set
nwb_version = get_nwb_version(nwb_file)
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/site-packages/ipfx/dataset/create.py", line 72, in get_nwb_version
re.match("^2", nwb_version) or
File "/allen/aibs/technology/conda/production/fx/lib/python3.6/re.py", line 172, in match
return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object

Environment (please complete the following information):

  • OS & version: windows 10
  • Python version 3.7.10
  • AllenSDK version 2.10.2

Additional context
Add any other context about the problem here.

Do you want to work on this issue?
Are you willing and able to fix this bug? If so, let us know here (and see the guide). Thank you!

I already have a fix that I have implemented on my own branch that I have used to properly get the nwb_version attribute from the recent nwb files that are currently failing by simply decoding the nwb_version attribute if it is an instance of the byte class and then proceeding through QC and feature extraction. I submit a pull request with the fix.

t-b commented

Here is the HDF5 dump output of an IP8 vs IP9 nwbv2 file

HDF5 "E:\projekte\mies-igor\tools\unit-testing\HardwareTests-Copy-compressed-IP8.nwb" {
ATTRIBUTE "//nwb_version" {
   DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SCALAR
   DATA {
      "2.2.4"
   }
}
}

HDF5 "E:\projekte\mies-igor\tools\unit-testing\HardwareTests-Copy-compressed-IP9.nwb" {
ATTRIBUTE "//nwb_version" {
   DATATYPE  H5T_STRING {
         STRSIZE 5;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SCALAR
   DATA {
      "2.2.4"
   }
}
}

and the difference is padding and the string size. From skimming https://docs.h5py.org/en/stable/strings.html#encodings I would say h5py differntiates between fixed length and padded and this is the source of the bug.

Resolved by the above commits.