CottageLabs/OpenArticleGauge

Add timeout to resolving DOIs and URLs

Opened this issue · 2 comments

In order to make the API more responsive, we should kill threads which take too long to process.

I have done a bit of investigation here with @Nimphal - this is a bit of a tangle, as can be expected.

So we have:
a/ celery timeouts
b/ requests timeouts

Counter-intuitively, b/ is NOT set by default. But there's no point in just adding a timeout argument to the requests.get call and putting our feet up - we need to understand what is causing the timeouts (and also why is our celery 5 min timeout not working). Thus I think we should build a very very small 7-line flask app which just hangs (busy-wait) and try to download a DOI from it. Then we can add the requests timeout, and see if that fixes the problem - as well as proceed to see why exactly does celery allow tasks to run forever.

Also to note, @Nimphal may have found out that the way we specify celery timeout is just wrong, as in that cmdline arg will be ignored because it doesn't exist. I haven't had time to follow up on that, but it could be done after we reproduce the hanging on a developer's machine (e.g. by using a flask app).

Wrt the busy-wait mini-app: I think I know how to do some pretty nasty stuff - infinite sleep is one thing, but if needed, I can write a small script (not with Flask) which opens a socket to respond to the OAG request coming in, and never closes it, but also never sends any data back. This should screw nicely with the I/O layer in Python(OAG).

celery/celery#1129 memory leaks in celery when timeout params are used, problem was in 3.0 branch which we use and has been backported but I'm pretty sure our celery isn't new enough to have the patch.

Consider upgrading celery if it continues to eat memory, since we do need those timeouts too.