project8/dragonfly

Hard psyllid check at ROACH end_run

Opened this issue · 0 comments

In several instances, psyllid meaningfully crashes during a ROACH run, but still returns valid request_status responses. This means we don't catch the crash until the next start run.

At the end of a run, dragonfly's timeout calls end_run method (daq_run_interface.py), which in turn calls the DAQ-specific _stop_data_taking method (roach_daq_run_interface.py). Because psyllid returns a valid status, this doesn't catch the failure state (https://github.com/project8/dragonfly/blob/develop/dragonfly/implementations/roach_daq_run_interface.py#L274).

Not entirely sure what psyllid queries would return an error status (starting a new run and making a mask both definitely satisfy this), but it would be nice if a quick query could catch this from dragonfly's side. @laroque @nsoblath @cclaessens may have thoughts?

In the long term, of course, psyllid should be better about it's own failure states.