[Meta] Ensure Github workflow runs on docker image used by Production Distribution Build
peterzhuamazon opened this issue · 18 comments
[Meta] Ensure Github workflow pull docker image used by Infra Jenkins build.
This is to ensure every plugin repo will use the exact docker images we used in Jenkins build, to check their PRs and run tests before merging the code, so that issues can be detected earlier, and environment can be identical across teams.
This is to make sure, integration tests are run on same environment as that of infra build and help catch issues early.
Also this requires the user of the run being changed to root
then switch to 1000
so both OS and OSD can start running without issues on linux: https://docs.github.com/en/actions/creating-actions/dockerfile-support-for-github-actions#user
- opensearch-project/k-NN#1042 (Trial run)
- opensearch-project/alerting#1250
- opensearch-project/cross-cluster-replication#1249
- opensearch-project/anomaly-detection-dashboards-plugin#620
- opensearch-project/ml-commons-dashboards#280
- opensearch-project/ml-commons#1538
- opensearch-project/job-scheduler#534
- opensearch-project/alerting-dashboards-plugin#789
- opensearch-project/anomaly-detection#1092
- opensearch-project/asynchronous-search#507
- opensearch-project/common-utils#557
- opensearch-project/reporting#924
- opensearch-project/dashboards-visualizations#287
- opensearch-project/index-management#1025
- opensearch-project/index-management-dashboards-plugin#907
- opensearch-project/notifications#809
- opensearch-project/dashboards-notifications#135
- opensearch-project/sql#2404
- opensearch-project/custom-codecs#76
- opensearch-project/geospatial#578
- opensearch-project/observability#1763
- opensearch-project/dashboards-observability#1223
- opensearch-project/performance-analyzer#591
- opensearch-project/dashboards-reporting#243
- opensearch-project/dashboards-search-relevance#345
- opensearch-project/neural-search#483
- opensearch-project/security-analytics#710
- opensearch-project/security-analytics-dashboards-plugin#777
As of now, github only supports linux docker at the moment.
Error: Container operations are only supported on Linux runners
We have a script to help easily retrieve the latest version of docker images Infra uses in Jenkins build:
20231006 We have sent out the sub-issues to all the existing plugin repos, and update the onboarding doc with this step.
20231011 Behavior has changed as I am currently working with Alerting team on improving the sourcing of the script and common workflows:
Additional Issues:
Exceptions:
- PA / RCA have issues due to inter-dependencies of each other, trouble checking out on the infra build image, and requires docker/docker-compose to complete the build tests
- PA Commons / i18n does not have existing CI yamls, or is not part of the bundled artifacts
Dashboards notifications has issues start cypress test on the rockylinux8 images.Resolved with another setups to only run yarn.- Security repo is too complicated to onboard this as they have wrappers around wrappers so need their teams pointers on this.
Thanks.
One thing I noticed when attempting this in the past was that user that GitHub runner has and will start every run does not have permissions to certain dependencies pre-installed.
Then attempt to install it's own version will cause errors like conflicts. Do we now build in the user 1001 to the Docker image?
@kavilla in the example above if you expand the entire workflow file, I have a method to use root to start the container, then switch back to 1000 user with all dir change owner to 1000 beforehand, to avoid issue.
Thanks.
@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time?
https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72
For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests
We have a new approach established here.
Please see this sample workflow file from Alerting for example:
https://github.com/opensearch-project/alerting/blob/main/.github/workflows/multi-node-test-workflow.yml
Thanks.
@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests
That is ok, as now all the repos will source from our get-ci-image-tag.yml
, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.
The other repos will just source from us so it will not diverse over time.
Thanks.
@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests
That is ok, as now all the repos will source from our
get-ci-image-tag.yml
, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.The other repos will just source from us so it will not diverse over time.
Thanks.
The source of truth is what will cause issues I believe. For example, for 3.x if we have completed revamped image and 2.x uses old image or even between releases if change image, manifest is the only source of truth. I think parsing a yaml (manifest in this case) is more legit. Based on products we have different manifests too. Same thing applies to integration tests yaml file.
@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests
That is ok, as now all the repos will source from our
get-ci-image-tag.yml
, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.
The other repos will just source from us so it will not diverse over time.
Thanks.The source of truth is what will cause issues I believe. For example, for 3.x if we have completed revamped image and 2.x uses old image or even between releases if change image, manifest is the only source of truth. I think parsing a yaml (manifest in this case) is more legit. Based on products we have different manifests too. Same thing applies to integration tests yaml file.
Like I said it is a workflow yml file, we can code it whatever we think it is suitable, either for the later branching strategy or others. We can wrap this as abstractions while others can source accordingly.
The other repos can use @branch
to decide which one to pull from, which would fit us well if we branch ourselves.
Again, the point is making everyone sourcing from us, then we can make changes accordingly as we are the owner of the images after all.
Why not do it right from the start than waiting on later when we enable branching. Also branching has nothing to do with what I trying to say. A particular manifest file(s) need to be parsed. Branch does not matter just the manifest files. Branching is more for our code base than any external repos. Let me know if I need to elaborate more.
Why not do it right from the start than waiting on later when we enable branching. Also branching has nothing to do with what I trying to say. A particular manifest file(s) need to be parsed. Branch does not matter just the manifest files. Branching is more for our code base than any external repos. Let me know if I need to elaborate more.
I am not sure what that means as I am already leaving room for the upcoming branching so it will not be hardcoded from now on.
Let me give an example:
For main focusing on 3.0.0 for all or most plugins, the CI image for building can be grabbed from 3.0.0/opensearch-3.0.0.yml
and 3.0.0/opensearch-dashboards-3.0.0.yml
This has nothing to do with branching. Manifests are the source of truth for building and testing. So instead of grabbing the top most ci images from ecr/dockerhub grab it from the manifest as that is what will be used.
Similar for 2.x branch, all the components can grab from related 2.x.x/*-2.x.x.yml
manifests.
Adding @bbarani @prudhvigodithi @dblock to see if this is making sense.
Let me give an example:
For main focusing on 3.0.0 for all or most plugins, the CI image for building can be grabbed from
3.0.0/opensearch-3.0.0.yml
and3.0.0/opensearch-dashboards-3.0.0.yml
This has nothing to do with branching. Manifests are the source of truth for building and testing. So instead of grabbing the top most ci images from ecr/dockerhub grab it from the manifest as that is what will be used.Similar for 2.x branch, all the components can grab from related
2.x.x/*-2.x.x.yml
manifests. Adding @bbarani @prudhvigodithi to see if this is making sense.
Like I said before if you want to make such changes, you can as we are the single source of truth.
The current implementation is sourcing from ECR, and we can definitely change it later.
But that doesnt stop the process of making changes to other plugins repos to source from us.
As that has nothing to do with how we retrieve the image information.
Grabbing the right version using their build tools (gradle or yarn) in order to get the right manifest is something that would need changes to all repos workflows again. Hence, suggesting to finalize a approach before moving forward.
Grabbing the right version using their build tools (gradle or yarn) in order to get the right manifest is something that would need changes to all repos workflows again. Hence, suggesting to finalize a approach before moving forward.
I didnt use their tool to grab the manifest.
I am not sure what you mean by that.
Their github workflow will run a workflow file provided by our repo, to retrieve the docker image and this has nothing to do with their gradle or yarn build tools.
Plugin github actions -> (can send version) -> build repo workflow -> build repo script -> return image -> run build.
I think yours is fine but like I said, those are the details internally on how our script would be implemented, on the high level you dont need to expose such detailed changes to plugin repos, you can just hide them inside the build repo.
My approach allows for the minimal changes from plugin repos, and that is the point I am making here.
I would argue your approach is good but over time I think it is going to diverse the code across each repos on how they retrieve the image.
Remember the detailed implementation on how to retrieve the image, either from ECR or from manifest, can be changed easily later within build repo. My implementation now is to have a foundation for the repositories to be able to make a call as simple as few lines, so they dont need to worry about the details.
Thanks.
Hey @peterzhuamazon thanks for the solution this should remove last minute release build/test issues as the testing is done on the same CI image that are used in distribution build.
I tried to implement this for job-scheduler, following are the concerns I have.
-
The
get-ci-images.sh
always retrieves the latest (version) tag, what if we dint update the version in input manifest to that version? or what if we want to build and test on the older version in input manifest ? since the plugin workflow always gets the latest version tag, would it be a problem if we want to build and test with the older CI version image? -
The
get-ci-image-tag.yml
is hardcoded to fetch frommain
, what if the build repo is branched? then should we re-update the plugin workflows to change the branch ?
@gaiksaya idea is more generic. Even for the version increment workflows we get the version from the plugin repo and increment, same way we get the version from plugin repo, use the respective version manifest from build repo and use the CI image from the input manifest.
Hi @prudhvigodithi as I mentioned before that script can be changed internally while maintain the external parameters, we can always tweak that script function at any time.
As for the yaml I already have a input field for people to change the branch called:
https://github.com/opensearch-project/opensearch-build/blob/main/.github/workflows/get-ci-image-tag.yml#L13-L16
We can talk more on this later.
Thanks.
We will close this issue once all the open PRs are closed.
With a few exceptions have reasonings listed in their separate tickets.
Thanks.