argovis/argovis_api

Loss of Argovis API performance between v1 and v2

Opened this issue · 2 comments

Hi !

Working on the argopy documentation (that was taking an unusual long time to build), I noticed that the last argovis API looks much slower than the previous one...

If you compare numbers using API v1 at:
https://argopy.readthedocs.io/en/v0.1.15/performances.html#comparison-of-performances

with those using API v2 at:
https://argopy.readthedocs.io/en/v0.1.16/performances.html#comparison-of-performances

there is a very significant performance lost going from o(20sec) to o(11mins) to fetch some regional data

It's hard to be more quantitative, but with last argopy versions (>v0.1.16) using the API v2 (from argovis-api.colorado.edu)
I can't barely get any faster than about 10mins for this use case

This is a little bit worrying to me, because I always recommended to use argovis for large domain requests

Looks like the overhead from the server of managing small requests is no longer worth the chunking
I hope this is due to a change of config on the server side that could be fixed, rather than on the API design

What do you think of this ? and do you have any clue of what's going on ?
On my side, I shall play with chunk sizes to see when the overhead is worth it

Best !

Guillaume

poke: @bkatiemills @quai20

Thanks for pointing this out - the new API actually looks considerably faster for the non-parallelized request (region b1). My first guess is that if you're firing a bunch of requests in parallel, you're hitting our rate limiter which only exists since v2 (people were firing tons of parallel requests at us and taking our service down, is why this was implemented).

Is there a verbose mode we can run this in so we can see the exact API calls being made, and maybe their timing too? If this is indeed what is happening, you should be getting some responses with HTTP code 429 and some JSON describing how frequently such requests can be made; we should be able to use this to tune the parallelization to an optimal level. Let me know what you see and we can go from there.

Actually, another thing worth noting - I am seriously considering re-paginating these responses to something much simpler; currently we limit request sizes temporospatially, which makes it complex and case-dependent to understand how fast the rate limiter will allow requests. A more traditional pagination by simple number of profiles will have a flat and easy-to-understand requests per second rate limitation. If we can confirm that the slow timings you're seeing from your parallel requests are due to rate limitation, I think that's a good argument to go ahead with this simpler pagination for simplifying parallelization.