tinkerbell/hook

Kernel publish job is having an issue

Closed this issue · 9 comments

Expected Behaviour

Kernel publish job is showing success, but is failing to upload the multi-arch image to quay. Need to troubleshoot further: https://github.com/tinkerbell/hook/actions/runs/558935852

I believe the issue is related to this buildx issue: docker/buildx#177

Possible workaround may be to use a local registry for the build portion like we are doing for the kernel CI job, and then add an additional step to move the image to quay.io. This would also give us the opportunity to eventually do additional validation of the image prior to publishing in the future if we wish.

After reviewing the kernel CI job, it looks like my previous suggestion may not work...

50 merging manifest list 147.75.101.109:5000/hook-kernel:5.10.11-78c072732359e42a1083c5801516fb2a2ca4c30c
#50 sha256:da8d1bc099b9e76087b416fcaccb5599e3b62e4e0a762b236aa9e14feb9cdc8f
#50 ERROR: httpReadSeeker: failed open: failed to do request: Get https://147.75.101.109:5000/v2/hook-kernel/manifests/sha256:ec951534d60a024cd7d3130a13a532d6dad0186c8a0317789871ce516dc95099: http: server gave HTTP response to HTTPS client

Might need to choose one of the following approaches, instead:

  • Build/Tag the arch-specific images separately and combine them using docker manifest or using docker buildx imagetools
  • Export the manifest list image to a different format rather than pushing directly to a registry
    • The exported manifest list image would need to be imported into the registry in some form, it might be possible to do this in a single step using docker buildx imagetools, otherwise it will be necessary to push the individual manifests for each platform and then create the manifest list.

Digging a bit further, buildx output type of oci and tar do not work with multi-node builds. We might be able to work with an output type of local, I'm still testing with that approach.

Client-side manifest merging issue for the local registry is here: docker/buildx#354

Once we are managing CI runners with https://github.com/tinkerbell/infrastructure, we can update the kernel jobs to build arch-specific builds on the appropriate runners and then build the manifest image after the arch-specific images are complete.

@detiber - was this ever solved?

@detiber Just pinging you again about whether it got solved.

mmlb commented

https://github.com/tinkerbell/hook/actions/runs/1813512375 will build and push a new kernel, if its successful we should close this.

mmlb commented

Looking at quay it looks like the builds are being pushed just fine so I'm marking this as fixed/closing.