Hard psyllid check at ROACH end_run
Opened this issue · 0 comments
In several instances, psyllid meaningfully crashes during a ROACH run, but still returns valid request_status responses. This means we don't catch the crash until the next start run.
At the end of a run, dragonfly's timeout calls end_run
method (daq_run_interface.py), which in turn calls the DAQ-specific _stop_data_taking
method (roach_daq_run_interface.py). Because psyllid returns a valid status, this doesn't catch the failure state (https://github.com/project8/dragonfly/blob/develop/dragonfly/implementations/roach_daq_run_interface.py#L274).
Not entirely sure what psyllid queries would return an error status (starting a new run and making a mask both definitely satisfy this), but it would be nice if a quick query could catch this from dragonfly's side. @laroque @nsoblath @cclaessens may have thoughts?
In the long term, of course, psyllid should be better about it's own failure states.