ssl-hep/ServiceX

quiet failures in put_file_complete

Opened this issue · 1 comments

When scaling up to large number of concurrent transformers (>500), put_file_complete in sidecar ServiceXAdapter quietly fails.
This leads to hanging requests since RMQ is empty (all messages processed) but App receiving only part of the confirmations.
Increasing number of threads in gunicorn alleviates the problem but real solution would probably go like this:
make another RMQ topic where transformer sidecars would put file complete documents. App would then subscribe to it, update db etc.

increasing number of threads in gunicorn does not solve the problem as then it leads to double counting or reported finished files.
Each put takes ~80ms on the server side. (last file takes ~10 seconds).
I tried faster storage and that does not help.
How much time it takes seems to strongly depend on number of rows.