AllenInstitute/ipfx

Failed and dropped sweeps are being used in Feature Extraction

Closed this issue · 9 comments

Describe the bug
I have been noticing that most of our data that has feature extraction performed will usually fail the longsquare, and ramp feature extraction with the "fail_fx_message" "Given time (1.500000) is outside of time range (-0.100000, 0.500245)", or fail the shortsquare feature extraction with the "fail_fx_message" "index 120200 is out of bounds for axis 0 with size 120050", and also sometimes I will see the error message "unsupported operand type(s) for +: 'NoneType' and 'NoneType'". These error messages occur when at least one sweep in the given sweep_set (longsquare, shortsquare, or ramp sweep_set), contains a sweep that was stopped before the start of the stimulus epoch.

This is occurring because the functions extract_cell_short_square_features, extract_cell_long_square_features, and extract_cell_ramp_features each call get_stim_characteristics from stim_features.py to find the start of the stimulus epoch from the first sweep in the sweep_set. If any sweep after that was stopped before the start of the start of the stimulus epoch, then the given time, which is the time point of the start of the stimulus epoch, will be out of range for that given sweep, that sweep will throw an AssertionError in the case of the ramp and longsquare sweeps (because of find_time_index in time_series_utils.py), and it will throw an IndexError in the case of the shrotsquares (because of estimate_adjusted_detection_parameters in spike_features.py), which will then get picked up by the function decorators @record_errors and @fallback_on_error which decorate the extract_cell_short_square_features, extract_cell_long_square_features, and extract_cell_ramp_features, causing the whole sweep_set to be tagged and setting the "failed_fx" to True. The error message "unsupported operand type(s) for +: 'NoneType' and 'NoneType'" occurs when the first sweep in the sweep_set was stopped before the stimulus epoch finished so the start and duration of the stimulus is returned as None from get_stim_characteristics.

The main issue here is the fact that these sweeps that were stopped before the start of the experiment epoch, and were tagged with "Recording stopped before completing the experiment epoch" and "Stimulus contains NaN values" are making it into feature extraction. Here are some example screen shots of the stimulus_summary json, qc_output json, and fx_input and fx_output jsons for the experiment located here:

  • /allen/programs/celltypes/production/mousecelltypes/prod3135/Ephys_Roi_Result_1098114063/

As can be seen in the following screenshot of the EPHYS_STIMULUS_SUMMARY_V3_QUEUE_input.json, most sweeps are tagged and will be dropped for sweep qc; however, with the caveat that most of these tagged sweeps were incorrectly tagged due to an issue that I identified in #508 and I have a fix for in #511.
For this experiment, only sweeps 34 and 35 were not tagged with either "Recording stopped before completing the experiment epoch" or "Stimulus contains NaN values". As can be seen in the screenshot, only sweeps 34 and 35 are not tagged:

stim_summary_json

In this next image of the EPHYS_QC_V3_QUEUE_output.json we can see that there are now only the 2 sweeps, 34 and 35, that were not tagged, and they are the only sweeps left in the qc output json file and sweep34 actually fails with the reason "post-noise: 0.072 exceeded qc threshold: 0.070" :

sweep_qc_json

Now in the EPHYS_FEATURE_EXTRACTION_V3_QUEUE_input.json we would expect to see only sweep 35 since sweep 34 failed qc and all other sweeps were dropped due to tags, but we see that all of the sweeps are present in the EPHYS_FEATURE_EXTRACTION_V3_QUEUE_input.json and even sweep 34 shows up with the "passed" status set to True.

fx_input_json

Finally, in the EPHYS_FEATURE_EXTRACTION_V3_QUEUE_output.json we still see all of the sweeps, even dropped and failed ones, and we also see that in the "feature_states" the long_squares_state and short_squares_state are set to True for "failed_fx" with the "failed_fx_messages" "Unsupported operand type(s) for +: 'NoneType' and 'NoneType'" and ""index 120200 is out of bounds for axis 0 with size 120080" respectively. The short_square_state is getting this fail message due to sweep 38, which did not complete the stimulus epoch, and the long_square_state is getting this fail message I believe because of sweeps 5 and 9, both of which did not complete the stimulus epoch. All of these sweeps were tagged at the stimulus summary stage and dropped for sweep qc but have somehow returned for feature extraction and are causing these issues.

fx_output_json

What I find interesting is when I execute run_pipeline() in ipfx.bin.run_pipeline.py I get the expected output, which for this experiment is that only sweep 35 gets past sweep qc and into feature extraction. I have included a notebook as well with some examples of with this same experiment and demonstrate how extract_cell_short_square_features, extract_cell_long_square_features, and extract_cell_ramp_features are throwing errors because of sweeps stopped short of the stimulus epoch, and also how feature extraction will work when those sweeps are removed. I also demonstrate running run_pipeline(), and all of the steps in that and how I get the expected result of only the sweeps passing sweep qc getting to feature extraction. I am still looking into the cause of the issue that we are seeing with all of out production data in LIMS but it seems like when I process these off-line I get the expected results. Any input from the IPFX developers would be greatly appreciated, thanks.

To Reproduce
I have attached screenshots of json files generated by the pipeline as well as a notebook that contains examples here:
fx_errors_testing.zip

Expected behavior
I expected only the sweeps that were not failed or dropped because of tags to show up in the feature extraction input json and to be processed for feature extraction

Actual Behavior
See screenshots of stimulus summary, sweep qc, and feature extraction json files.

Environment (please complete the following information):
All screenshots are from json files produced by the pipeline and the files can be found on the network in this directory:

  • /allen/programs/celltypes/production/mousecelltypes/prod3135/Ephys_Roi_Result_1098114063/
    The notebook I ran on my machine:
  • OS & version: Windows 10 version 1909
  • Python version: 3.7.10
  • AllenSDK version: 2.10.2

Additional context
This issue is partially related to #508 and #511, where I have identified an issue that is causing most sweeps to be incorrectly tagged and dropped (as in this case, most of these sweeps should not have been tagged with "Stimulus contains NaN values" and "Recording stopped before completing experiment epoch", and should have not been dropped), and where I have also submitted a potential fix for this issue. This issue may also be related to this issue #507 where something similar seems to be happening with failed and dropped sweeps showing up in LIMS as "auto-passed". In fact, for this experiment sweep 34 failed sweep qc in the EPHYS_QC_V3_QUEUE_ouput.json but it shows up in the Ephys sweeps of the specimens table in LIMS as "auto-passed".

Do you want to work on this issue?
I can continue to investigate this issue and work on a fix but if there are any IPFX developers out there that might have an idea of what is going on here I would greatly appreciate the input. Please let me know if I have been unclear about anything or if there are any questions about this issue, I am to be a helpful resource in getting this issues fixed for the IVSCC pipeline, thanks.

@wbwakeman and @sgratiy I may need some help determining the cause of this issue where sweeps that were tagged and dropped or failed during sweep qc are still being used in feature extraction, especially because when I run the run_pipeline.py script I am getting the expected result of only passing sweeps making it into feature extraction. Also, it would be nice if someone could look at #508 and the corresponding pull request #511 because we will need to address the issue of sweeps incorrectly getting tagged and dropped before addressing this issue, thanks.

Looks like an issue with a LIMS strategy since off-pipeline it works as expected. @ru57y34nn Thanks for such a detailed explanation. We are investigating...

I am seeing that the ephys_qc_v3_strategy.rb has changed.
I have briefly identified a commit ec97841 where the logic worked correctly inside the sweep loop. You can see the change
like so (when on HEAD is on dev):
git diff ec97841 HEAD ephys_qc_v3_strategy.rb

 
-            if s.ephys_sweep_tags.present? # auto_fail sweeps with tags
-              s.workflow_state = "auto_failed"
-            else
-              s.workflow_state = "auto_passed"
-            end
+            s.workflow_state = "auto_passed"
             s.save!
           end

I think if the original logic is restored, the issue should be fixed

        EphysSweep.includes(:ephys_sweep_tags, :ephys_stimulus).where(:specimen_id => roi_result.specimens.first.id).each do |s|
          sweep_state  = sweep_states[s.sweep_number]
          if s.auto_passed? || s.auto_failed? || s.unknown? # do not change state for manual calls, but only for auto
            if s.ephys_sweep_tags.empty? # only sweeps without tags run through qc
              if !sweep_state['passed']
                sweep_state['reasons'].each do |reason|
                  tag = EphysSweepTag.find_or_create_by(name: reason.strip)
                  s.ephys_sweep_tags << tag unless s.ephys_sweep_tags.include?(tag)
                end
              end
            end

            if s.ephys_sweep_tags.present? # auto_fail sweeps with tags
              s.workflow_state = "auto_failed"
            else
              s.workflow_state = "auto_passed"
            end
            s.save!
          end
        end # sweep_loop

That logic was changed in June 2020 because we were told that if some sweeps failed, the entire cell would be marked as autofail. The change would allow the cell to continue to feature extraction, which was updated to extract all possible feature data, even if individual analyses fail.

I was not aware of this change, but do recall some discussion with Jim that may have motivated it, around getting sweep level features in LIMS for failed sweeps. However, marking failed sweeps as passed is definitely not the right way of doing that, and maybe just resulted from a misunderstanding! (failed sweep features would likely also require further IPFX tweaks that haven't been implemented anyway)
Maybe this can be reverted and the cell level autofail issue fixed instead?

If I recall correctly the issue, Jim wanted to not fail the cells if they had at least some extracted and therefore useful features #398 . This feature was added in #428, thus cells would no longer auto_fail if, for example, all long squares sweeps had failed QC. But I am puzzled why it was decided to auto_pass all sweeps for feature extractor. @MatthewAitken do you recall the rational for that change?

  • Discuss with the team the rational for the previous change that made all sweeps "auto_passed"
  • Consider reverting the change in the qc strategy as listed in Sergey's comment above
  • Run experiment in question on the local version of LIMS

Validate:

  • check that EPHYS_QC_V3_QUEUE_output.json EPHYS_FEATURE_EXTRACTION_V3_QUEUE_input.json and EPHYS_FEATURE_EXTRACTION_V3_QUEUE_output.json have consistent passing sweeps.

@wbwakeman / @sgratiy I believe that ephys_qc_v3_strategy was fixed, and this issue can be closed?

Closing this issue, was actually occurring in LIMS processing code and was fixed there.