indix/gocd-s3-artifacts

Support publish and fetch from the same pipeline

Closed this issue ยท 24 comments

Currently we only support publish and fetch from different pipelines

cc: @ashwanthkumar @Sriram-R

Are you guys planning on doing that any time soon?
If not, I'm willing to try to open a PR with the changes and we work on top of that (as I never developed a plugin before)

What I'm thinking is to have a similar experience with gocd's built-in artifacts:
Stage Name, Job Name and Source Directory as mandatory fields
Destination as optional field

The decision mechanism would be to use the same path that publish plugin generates and, since we do not have information about counters from previous stages/jobs, get the latest available on the s3 bucket, or throw an error if none found.

I don't think publish plugin should need to change by any means in order for this to work.
WDYT?

So this is what I had in mind
Doing on a fork repo to open a PR here when the feature is done (and accepted).

Is still missing the core stuff, which is finding out the latest stage counter for the provided pipeline, but the structure and main idea is done and can be discussed. Feel free to comment in there!

The only problem that I'm having is to setup a proper IDE to work on (tried with IntelliJ but with no success), so I'm currently developing "in the dark" (no autoComplete, checking if code is compiling, refactoring capabilities, etc...) basically to check if everything is working, I have typos or anything, I run all the tests. Do you have any tip on how to configure an IDE for this project?

I have a meta question, what's the motivation for doing same pipeline fetch via s3 plugin? Is it because the artifact of the pipeline is too big to fit on go-server's artifacts directory or something else? Isn't it a simple to use the existing publish and fetch mechanism in GoCD?

On the IDE part, we use IntelliJ to develop, of course to see it in action you might need to have a GoServer running on a separate IDE instance (see docs here) or an installed version wherein you can drop the jar from the repo.

We have a really big stack running in our gocd server, all our companies services, mobile (both iOS and android), data-science models and etc...

So, normal publish and fetch mechanism in GoCD is starting to be a pain and unreliable (it's still working well for small stuff, but the db is often filled and have to be cleaned up and etc...)

We've been wanting to try a different approach with s3 for a while. And as we're already using your plugin for some pipelines, I though, why not?

Do you see this going an opposite direction from the initial existence reason of this plugin?

@thalescm - I run the server with this property - -DpluginLocationMonitor.sleepTimeInSecs=30. Other than that I have scripts to build the jars and copy them over to the plugin location. Have few pipelines configured in the GoCD to verify all the features. You can also post to go-cd-dev@googlegroups.com to see if they have any better setup.

So this is what I had in mind

Looks close to what I would like it to be. We can call the new one something like SelfFetchExecutor. I don't generally like the name refactorings, that confuses the feature with the refactoring and not sure if it is really needed here to get the feature in first.

Do you see this going an opposite direction from the initial existence reason of this plugin?

The reason we developed the plugins was to help us split our workloads across many GoCD servers. We build on many different GoCD servers (owned by various teams), and fetch artifacts from other servers that run our data pipelines and such.

We are happy to support other workloads, of course.

I don't generally like the name refactorings, that confuses the feature with the refactoring and not sure if it is really needed here to get the feature in first.

I totally agree, I did it mainly due to pipeline material type already having a param called job which would conflict with the new feature job. I'll try to think a better way in order to not need to refactor the existing ones.

having a param called job

With the way it is implemented, and afaik, it's the best we can do, the config is share already across package and pipeline types. I think it is ok to reuse it for this one too, atleast for the initial implementation.

finding out the latest stage counter

Should we make an API call to get the latest stage counter?

Should we make an API call to get the latest stage counter?

I was thinking on listing the bucket at s3://bucket/pipeline/stage/job/, getting the files which match (number.number), split(".") (separate pipeline counter from stage counter), filter for pipeline counter matching the current one, select higher stage counter. Throw if any of the steps fails.

I really don't know if there's an easier way

We could use the Pipeline Stage history endpoint to know the latest successful run (both pipeline counter and stage counter) for a particular stage.

Cool! Didn't know about that. Seems very straight forward

@thalescm - do you think you will be able to make a PR with taking into account the above discussion on naming, not refactoring and getting the stage counter? We will be happy to test it and merge it at the earliest :)

@manojlds take another look at the PR. (Reverting) Naming part is basically done. Now it only misses the stage counter, which I'm struggling to find out the best way. Do you have an example of plugin that make an API call to go?

I started building a Java Client for GoCD few months back, it doesn't have stage history support but has pipeline history support (which is used in Janitor tool) Would you be interested in contributing that SDK and using that here? @manojlds Thoughts?

Do note that this requires authentication with GoCD Server. So we might need to start collecting a username and password to query GoCD Server via API.

Humm. When and how should we collect the username and password? It's seems alright to contribute and add functionality to get latest successful stage counter. But IDK if it would this mess up with this plugin usage.

For example, in my company we use the github plugin to authenticate users, so probably no one knows their username and password. Can't we get information on the agent running the stage in order to use it's credentials to access the API? (IDK if that's possible, asking out of curiosity to try to open possibilities)

Thinking about it in the context of authentication to hit the api, it now feels like looking at S3 for the latest itself could be the right way forward.

@manojlds I've opened the PR with the changes. Will start testing it on Go to see if the UI part is ok, which I haven't done yet.

Updating:
Already tested on gocd.
UI and plugin working as expected. If you wish I can share the cruise xml I used to test it.

@manojlds any news on here?

@thalescm - haven't got around to looking at it yet. Will let you know soon.

Fixed in #59