gaia-pipeline/gaia

Job outputs?

prologic opened this issue ยท 20 comments

Going through the documentation, examples and the Go SDK (I assume the other language SDKs are the same); It doesn't seem possible to have a job compute and return some output which you may use as input into the next job(s) in your pipeline.

The only thing a Job can do is take some input Arguments and return an error. I assume the design calls for inputs to be known up front.

What if I need some inputs to a Job in my pipeline whose values are dependent on a previous job? This doesn't seem possible right now; Is this by design? Does this add considerable complexity to something like Gaia?

In a "normal" gRPC/Protobuf service oriented architecture (which Gaia is loosely based around; but behaves more like server-less, faas...) you would expect to be able to return some "output" from your service's endpoints/functions/etc.

Thoughts?

Hi @prologic!

You are absolutely right. This is currently not possible cause we didn't implement this feature yet.
It has been on the list for a long time but other features were prioritized over this.
I will label this as a feature request where we can track the progress of it.

Cheers,
Michel

Awesome! Thanks for the confirmation. I wouldn't be able to replace all this gnarly Jenkins spaghetti I have here with a Gaia workflow and appropriate implementation of "Jobs" without this so looking forward to having this feature implemented!

I think the challenge here would be to define a "protocol" / "format" that Jobs can "return ouytput" in a sane and consistent way. My recommendation based on the code I'm seeing and architecture/design is to have a "Context" object (similarly to the Arguments object) whereby an author of a job can insert arbitrary key/value pairs into the job's context. To make use of this in a workflow/pipeline the "context" would have to be persisted alone the DAG.

Any updates on this?

Hi @prologic. Unfortunately both of us have been very busy with life lately and some other commitments. If I recall correctly, @michelvocks first would like the docker executor in because that's a massive change. That will go in sometime this week and then we can start working on something else. :)

Also, I'm hoping my schedule will get better in a few weeks or so, then I can start concentrating on Gaia a little bit more again. :) That would be nice as I have a few things on my list that I would like to work on. :D Cheers for your patience.

Hi @prologic. Unfortunately both of us have been very busy with life lately and some other commitments. If I recall correctly, @michelvocks first would like the docker executor in because that's a massive change. That will go in sometime this week and then we can start working on something else. :)

Are there future plans for a Kubernetes/Nomad executor too at some point?

Also, I'm hoping my schedule will get better in a few weeks or so, then I can start concentrating on Gaia a little bit more again. :) That would be nice as I have a few things on my list that I would like to work on. :D Cheers for your patience.

Sounds good. Let me know if I can help in any way, docs, design testing, etc.

Hi @prologic. Unfortunately both of us have been very busy with life lately and some other commitments. If I recall correctly, @michelvocks first would like the docker executor in because that's a massive change. That will go in sometime this week and then we can start working on something else. :)

Are there future plans for a Kubernetes/Nomad executor too at some point?

Not that I know of.

Could you elaborate on what do you mean by Kubernetes executor? Do you mean a CRD + Operator?

Could you elaborate on what do you mean by Kubernetes executor? Do you mean a CRD + Operator?

Maybe it would help if you described what this docker executor is? Or point me to a PR or Issue?

Sorry @prologic I missed your reply! Here is the PR: #201 :) It has been merged. So we can now move on with other things. My other project is also done-ish so I'm going to focus on Gaia some more. ;)

That's awesome! Great job!

Alright. Let's take a look at this. :)

So the way I see it, it's possible that a job can have a return value, but that value would have to be very generic as jobs could have multiple types of outputs...

I propose a list of key value pairs. Something like, "DNS": "whatever.com". And your job which is waiting for something knows what it wants so it can look for a key like "DNS".

@prologic @michelvocks What do you think?

Something like...

message Output {
   repeated OutputValue items;
}

message OutputValue {
   string key = 1;
   string value = 2;
}

I propose a list of key value pairs. Something like, "DNS": "whatever.com". And your job which is waiting for something knows what it wants so it can look for a key like "DNS".

This sounds perfect!

Yup just a simple KV map would work very nicely here. I would not support anything beyond this.

@prologic Almost done. :) Now need to test this thing and write some unit tests and have a review from Michel. :)

@prologic Almost done. :) Now need to test this thing and write some unit tests and have a review from Michel. :)

I'm more than happy to spin up a new Gaia instance to test your PR too :) if that helps!

Cool. :) You'll have to build it though because I had to edit the import paths and such. If you're okay with that, that would be a lot helpful. :)

Cool. :) You'll have to build it though because I had to edit the import paths and such. If you're okay with that, that would be a lot helpful. :)

Sure no problems! Just make sure your PR has a "Test Plan" I can follow and I'll find some time to test your stuff this week :)

Cool. :) You'll have to build it though because I had to edit the import paths and such. If you're okay with that, that would be a lot helpful. :)

Sure no problems! Just make sure your PR has a "Test Plan" I can follow and I'll find some time to test your stuff this week :)

Absolutely. I'll update the PR with detailed instructions. :)