bluegreen-labs/ecmwfr

batch requests: Send a new request as soon as one is downloaded

eliocamp opened this issue · 8 comments

The way the queue system is implemented now, once one requests is finished and downloaded, the code then checks the next request instead of immediately sending a new request. In practice requests will probably finish processing at roughly the same time, so the function will download one after the other until it stops and then send the new requests. It would be more time efficient to send a new request each time one is downloaded so it can start being processed while the code downloads the next.

I hope this is clear enough. 🤣

Might be related, when using batch requests I get this error at times:

No encoding supplied: defaulting to UTF-8.
Error in if (private$status == "completed") { : 
  argument is of length zero

This might suggest some offset timing in checking the status (with the element not being there anymore).

Just writing this down for reference. I've had a hard time debugging this. Anyway, will get to this if I've more time.

Personal notes, but read along if helpful.

Trying to resolve this as it is slowing my work down. Some context, I'm running a large job of small subsets (site based) searches for hourly cloud cover data. Generally jobs are small (~100kb max) but traverse the time axis for a long time (decade). Batch processing stalls on the above error after variable times. This breaks the routine / downloads.

I have a hunch that I'm getting kicked off the server or the timings are off, generating an edge case which is not captured in the current setup.

To prevent frequent polling of the API using the batch routine I've put in a Sys.sleep(3) in the while loop to slow things down. I'll run with this and see how far I get in this instance.

Tracking here: https://github.com/bluegreen-labs/ecmwfr/tree/patch-batch-request-issue

So, well that didn't work. So it ain't an issue with spamming the API. I'm wondering if we miss one of the html codes in processing. Basically the error states that private$status doesn't exist (or is empty, i.e. wasn't assigned somewhere).

So, if here:

update_status = function(fail_is_error = TRUE,

Nothing gets assigned for some reason, things fall apart.

Root cause is this statement which is split from the rest of the if - -if else logic.

if (private$status != "completed" || is.null(private$status)) {

When is.null(private$status) is true, values of private are set but this will not skip the evaluation of the next if statement.

when private$status is NULL in the next if() this statement will fail and error. Running a long run now to test stability.

Long requests at time stall on:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Error in the HTTP2 framing layer

Not the argument of length zero error anymore though.

The above seems to be related to new support in curl (httr) for http2, which is automatically enabled.

r-hub/rhub#19

Can be set manually with: httr::config(http_version = 2) or in this case probably should be:
httr::set_config(httr::config(http_version = 0))
to avoid the error

Might be related, when using batch requests I get this error at times:

No encoding supplied: defaulting to UTF-8.
Error in if (private$status == "completed") { : 
  argument is of length zero

This might suggest some offset timing in checking the status (with the element not being there anymore).

Just writing this down for reference. I've had a hard time debugging this. Anyway, will get to this if I've more time.

Thanks for the amazing package!
Did you find any solution for the issue above?

This should be fixed in the latest dev release, not on CRAN yet as it is an edge case.