fabiobatalha/crossrefapi

Add a way to respect API rate limits and timeouts

Opened this issue · 3 comments

According to the api docs, the response may contain the following headers to indicate a request to self-limit request rates:

X-Rate-Limit-Limit: 50
X-Rate-Limit-Interval: 1s

It would be neat if this API supported a mode to self-limit requests to conform to this, or allow for a way to signal these limits to an underlying user.

Happy to submit a patch, if this is a welcome feature.

Also, please let me know if something like this is already implemented here, then I'm happy to write some documentation!

Hello @AntonLydike

There is a polite mode in the API. In fact, this API has a synchronous approach so usually it never do lots of requests. One implementation that is attended to increase the API performance is to do requests in parallel using multiprocessing or something like that while iterating into pages.

You can review the polite mode.

I toke a look in the implementation and it seems to be broken, it should be improved for better performance.

Take a look at:

def do_http_request( # noqa: PLR0913

Basically, what I'm doing is sharing a single Works object between multiple threads. I implemented rate limiting on top of that, but I basically have to guess the current limits (which seems to vary daily, some days I get away with more requests/second than others).

It would be cool to have an internal method inside the API to handle this rate limiting even when used in a multi-threaded workload. (no need to do multiprocessing here as pythons multithreading works fine for IO bound workloads like this one).