ssl-hep/ServiceX

Hanging requests

Opened this issue · 1 comments

Sometimes a request is not terminated correctly and stays in the system for days. This reduces resources we have, requires manual clienup and creates huge problems on platforms like NRP. There if a pod stays for a long time and not uses cpu, it is counted as a violation and that user can't create new jobs.

do we understand why they hang? if it's a case of a bug we should fix it, because the people if they're running against that request may have also hung.