AllenInstitute/ipfx

running extract_data_set_features on some PatchSeq datasets fails

Closed this issue · 1 comments

Describe the bug

There are some nwb files in the PatchSeq dataset which cause extract_data_set_features to crash.

The error seems to originate in SweepSet.align_to_start_of_epoch in ipfx/sweep.py. There are some sweeps in the offending files with 'experiment': None, 'stim': None which causes a warning at line 89,

            start_idx, end_idx = sweep.epochs[epoch_name]

when epoch_name='experiment', which craches out of the entire method.

To Reproduce

This code will download one of the offending files (there are actually four offending files offered, if you want to try all of them) and run the feature extraction. I am open to the possibility that this is a problem with the data, rather than the code. I'm not sure if we should expect a sweep in an nwb file to have 'experiment': None.

from ipfx.dataset.create import create_ephys_data_set
from ipfx.data_set_features import extract_data_set_features

import urllib.request
import os

url0 = 'https://girder.dandiarchive.org/api/v1/item/5edb2cb42dace54b6f9b35f6/download'
url1 = 'https://girder.dandiarchive.org/api/v1/item/5ee84eed17a31a38dab096f2/download'
url2 = 'https://girder.dandiarchive.org/api/v1/item/5edb2e902dace54b6f9b3aa1/download'
url3 = 'https://girder.dandiarchive.org/api/v1/item/5ee84cdb17a31a38dab09229/download'

url_to_tmp = []
url_to_tmp.append((url0, 'nwb_tmp0.nwb'))
url_to_tmp.append((url1, 'nwb_tmp1.nwb'))
url_to_tmp.append((url2, 'nwb_tmp2.nwb'))
url_to_tmp.append((url3, 'nwb_tmp3.nwb'))

url, tmp = url_to_tmp[0]

if not os.path.exists(tmp):
    urllib.request.urlretrieve(url, tmp)

dataset = create_ephys_data_set(tmp)
features = extract_data_set_features(dataset)

Expected behavior

I'm not sure, but I had hoped to just be able to run extract_data_set_features on the PatchSeq datasets to get features for our web-app.

Actual Behavior

Here is the traceback I get

WARNING:root:cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  File "test_ephys_failure.py", line 24, in <module>
    features = extract_data_set_features(dataset)
  File "/Users/scott.daniel/AllenInstitute/miniconda3/envs/allen_sdk/lib/python3.7/site-packages/IPFX-1.0.1-py3.7.egg/ipfx/data_set_features.py", line 378, in extract_data_set_features
    sweep_features[s['sweep_number']]['peak_deflect'] = s['peak_deflect']
KeyError: 4

sweep_features is empty because the error in SweepSet.align_to_start_of_epoch causes extract_sweep_features to exit early (I think)

Environment (please complete the following information):

  • OS & version: OSX 10.15.5
  • Python version 3.7.9
  • AllenSDK version 2.2.0

Do you want to work on this issue?

I am willing to work on this, but I am not sure what the code should do in this case (i.e. is the data ill-formed, or should this code run).

Sorry. Running the dataset through ipfx.utilities.drop_failed_sweeps() resolves this issue.