senx/warp10-platform

Syntax for adding other HTTP methods to URLFETCH

Closed this issue · 8 comments

Currently the URLFETCH fonction only support GET method. The signatures are:

url:STRING headers:MAP<STRING> URLFETCH
urls:LIST<STRING> headers:MAP<STRING> URLFETCH

with headers being optional.

To support additional methods, I suggest:

request:MAP<STRING> headers:MAP<STRING> URLFETCH
requests:LIST<MAP<STRING>> headers:MAP<STRING> URLFETCH

where request shall have the fields 'url', 'method' and optionally 'message', and headers are still being optional.

Is that signature good ?

It looks good, even if I think the keys of the map could have a different name (url, verb and body or payload).

Too bad I added the header signature without thinking about that kind of signature before because it would definitively make sense to have also a header key in the map you want to define. As it is a fresh addition (2.7.0) maybe it should be considered whether to break the header signature to allow

request:MAP<STRING> URLFETCH
requests:LIST<MAP<STRING>> URLFETCH

with requests containing a header key.

Not sure about verb, I think method is more explicit. I agree with body or payload instead of message though. I think body is better.

Putting a MAP inside a MAP makes for a bit clunky codes.
Maybe, in order not to break the header signature, we can add only the signature request:MAP<STRING> headers:MAP<STRING> URLFETCH, and not the second one, since anyway, the second signature is not that much of a gain compared to using FOREACH or LMAP, and I agree it doesn't always make sense to use the same headers for all requests, especially when they are not all GET methods.

Agreed on url/method/body keys.

I'm not convinced that maps of maps are a problem but breaking the 2.7.0 is more of a problem, I think @hbs is best to arbitrate on that. If we want to keep backward-compatibility I think both

request:MAP<STRING> headers:MAP<STRING> URLFETCH
requests:LIST<MAP<STRING>> headers:MAP<STRING> URLFETCH

are nice to have, even if, I do agree, it can be done with FOREACH or LMAP.

Also,there might be an issue for the signature request:MAP<STRING> headers:MAP<STRING> URLFETCH. Since headers is optional, URLFETCH would have to check if the first map it retrieves from the stack is a headers or request. I don't see a way to do this without limiting the names of the fields that are allowed in headers.

So I think that if we choose that signature, headers cannot be optional.

I intend to run many URLFETCH at once (it's a data platform and users can run their scripts as they please) which can quickly hog all memory so a streaming option could be really useful. Otherwise, I need to limit URLFETCH maxsize which is not ideal.

Maybe something like requests:LIST<MAP<STRING>> headers:MAP<STRING> chunk_size:LONG URLFETCH. chunk_size can be -1 by default but if set, it returns chunks of content with that size as an iterator.

Another way could be requests:LIST<MAP<STRING>> headers:MAP<STRING> start_byte:LONG end_byte:LONG URLFETCH end:BOOL content:STRING. When start_byte and end_byte are defined URLFETCH reads and drops all data before start and reads until end_byte. If the end of the stream is reached, end is set to True.

I expect it to be slower but scales much better.

Maybe an idea similar to #855 can be used with a byte delimiter (or a chunk size) and a macro to be run for each of these chunks.

The other way around, for POST, a macro could create chunks to be sent, but that may be overengineering.

Good suggestion. It is added in PR #891

hbs commented

Solved by PR #891