Rework the process execution behaviour
Closed this issue · 11 comments
With API v0.0.2 there are some major changes in the process management. On the POST /jobs
there is no longer an evaluate parameter. The former sync
call is now POST /execute
. batch
is now called with PATCH /jobs/{job_id}/queue
and lazy
is done via POST /services
.
The new 'batch' and 'lazy' call requires an already uploaded process graph and optional output parameter. This is done by POST /jobs
now.
This changes require some renaming and new functions for the R client. I suggest that job creation (without determining whether to run in 'batch' or 'lazy') is called createJob
. Then, to call the 'lazy' evaluation, we call the function toService
and queueJob
will match the similar named call on the backend for the 'batch' evaluation.
This means the following functions will be removed: queueTask
, orderResult
.
The naming can be change and I'm open to suggestions.
I'm not sure why you'd want to change the function names. I'd stick with the ones we already have.
Yes, you do need to upload the process graphs now, but that's what queueTask
was made for to begin with. Asking the user to run createJob
and queueJob
right afterwards doesn't make sense. Not sure what toService
is supposed to do, though; but either way it can have createJob
inside it so the user doesn't have to run it.
The main reason why I thought about renaming was the term collision of 'queue'. In the API this means now to execute in the former batch mode and the client would use it for 'lazy' evaluation.
Regarding toService
it will take on the job that the former queueTask would have done (creating a service from a job). For the integrated createJob
, that is a fair point and should be implemented in that way.
Then another change was that we have not a GET /jobs/{job_id}/download
to get the results of a 'batch' job. That function would need a name as well, maybe something like jobResult
, and this function cannot be integrated into the former orderResult
function, because then it would be a synchronous call.
Hrm, I missed that part... What exactly does the /jobs/{job_id}/queue
endpoint do? The API says it converts it to batch mode, but I thought that batch mode was supposed to become merely a special case of lazy (you call something like get_download_links()
instead of a direct download()
).
Is toService
the /services
endpoint, so it refers to the W*S services?
For the latter, there's downloadJob()
already.
OK, the general approach for 'batch' and 'lazy' is now, that you upload your job onto the backend via POST /jobs
. The job will remain in a status where it is just submitted. Afterwards the user can decide if they want to publish it as a W*S service via POST /services
or if they want to start the batch processing via PATCH /jobs/{job_id}/queue
.
If you want to download the 'batch' results, then you will call GET /jobs/{job_id}/download
.
Yea, though the API description also says that the W*S service can read from the result of the batch processing as well.
So does PATCH /jobs/{job_id}/queue
actually start the processing? If so, I get why they'd want to have this distinction in the API (GET is supposed to be synchronous, I guess). But at the same time, that doesn't give the backend information on where and how to store the output, does it?
Ah, I looked at the examples again, and yea it says that it starts processing when the PATCH
is sent, the output format data is part of the job definition.
In that case, we should have one function that sends the job and starts processing (orderResult()
was the idea: define the job + PATCH /jobs/id/queue
), one function that gives a W*S link (this is new: define a job + POST /services
), and our current executeTask()
for synchronous. And then for expert applications (if someone wants both a W*S and a list of download links), one could use functions that are separate for the two steps (defining jobs was the point of queueTask()
).
I can see the issue of awkward naming, where orderResult()
calls /queue
and queueTask()
calls /jobs
, but the function names are user-facing, whereas the API is internal (and still subject to change). For a user, if they want to submit a job to get a list of files to download, that's an order. And if they just want to put up the job to later do something with it, that's a queue (arguably; defineJob
or so would be fine as well, especially since it's consistent with defineUDF
).
How the services should behave, i also didn't understand fully. But in my opinion, it seems that we are allowed to do:
- upload job -> create service
--("submitted")----("running") - upload job -> start batch evaluation -> (wait until finished) -> create service
--("submitted")------("queued")----------------("finished")--------("running")
In 1) we create a service from a process graph directly, meaning we compute on the fly for the W*S (in terms of an use case: calculate the ndvi of the latest xxx collection), or in 2) you calculate your data via 'batch' (PATCH /jobs/{job_id}/queue
) and then you offer those information as a service (then there should be no computing). The values in parenthesis are the allowed stati of the job object (see job_status object in the api).
Hm, right, that would make sense. But in case 2, we basically have the download links for free anyway. So we could have orderResult()
give the Job ID, and then allow passing that to the new function (toService()
, in my example we also already have getWCSLink()
, but I guess something like getWebServiceLink()
might be more understandable?). In case 1, we'd use defineJob()
(or so) + getWebServiceLink()
(or so).
https://github.com/Open-EO/openeo-r-client/wiki/Development-Overview-v0.2.x to have an overview about endpoints, client functions and wrapper functions. I will try my best to keep it up to date with the develop-branch
Nice, that helps.
The examples should also be updated. I'll add a new issue about that.
Hm, for listing capabilities and formats, I think the wrapper functions should start with list
just like for collections and processes.