`host-ctr` cli crashes when pulling public ECR image
taraspos opened this issue · 11 comments
host-ctr
CLI crashes with panic when trying to pull any public ECR image, while private ones work fine.
Image | Can pull? |
---|---|
328549459982.dkr.ecr.us-east-1.amazonaws.com/bottlerocket-control:v0.7.12 |
✅ |
public.ecr.aws/bottlerocket/bottlerocket-control:v0.7.12 |
❌ |
Image I'm using:
bash-5.1# cat /etc/os-release
NAME=Bottlerocket
ID=bottlerocket
VERSION="1.19.4 (aws-k8s-1.28)"
PRETTY_NAME="Bottlerocket OS 1.19.4 (aws-k8s-1.28)"
VARIANT_ID=aws-k8s-1.28
VERSION_ID=1.19.4
BUILD_ID=4f0a078e
HOME_URL="https://github.com/bottlerocket-os/bottlerocket"
SUPPORT_URL="https://github.com/bottlerocket-os/bottlerocket/discussions"
BUG_REPORT_URL="https://github.com/bottlerocket-os/bottlerocket/issues"
DOCUMENTATION_URL="https://bottlerocket.dev"
What I expected to happen:
ECR image is successfully pulled
What actually happened:
Running host-ctr run --source public.ecr.aws/bottlerocket/bottlerocket-control:v0.7.12 --container-id test
results in:
time="2024-06-17T12:25:22Z" level=info msg="Image does not exist, proceeding to pull image from source." ref="public.ecr.aws/bottlerocket/bottlerocket-control:v0.7.12"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x557adfa3a83d]
goroutine 1 [running]:
main.withDynamicResolver({0x557ae03822d8?, 0xc0006b1200}, {0x7ffce586eebc, 0x38}, 0x0)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:1150 +0x19d
main.pullImage({0x557ae03822d8, 0xc0006b1200}, {0x7ffce586eebc, 0x38}, 0x38?, {0x0?, 0xc00071d308?}, 0xc00064b1d0)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:1046 +0x39e
main.fetchImage({0x557ae03822d8, 0xc0006b1200}, {0x7ffce586eebc, 0x38}, 0x557adfa46468?, {0x0, 0x0}, 0x0, 0x0?)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:1013 +0x3e7
main.runCtr({0x557adfa8c224, 0x24}, {0x557adfa46468, 0x7}, {0x7ffce586ef04, 0x4}, {0x7ffce586eebc, 0x38}, 0x0, {0x0, ...}, ...)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:299 +0x467
main.App.func1(0xc0004c4000?)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:144 +0x93
github.com/urfave/cli/v2.(*Command).Run(0xc0004c4000, 0xc0004b8d40, {0xc0004bc320, 0x5, 0x5})
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/vendor/github.com/urfave/cli/v2/command.go:279 +0x9dd
github.com/urfave/cli/v2.(*Command).Run(0xc0004c51e0, 0xc0004b8500, {0xc0000401e0, 0x6, 0x6})
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/vendor/github.com/urfave/cli/v2/command.go:272 +0xc2e
github.com/urfave/cli/v2.(*App).RunContext(0xc000156e00, {0x557ae0382268?, 0x557ae10db440}, {0xc0000401e0, 0x6, 0x6})
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/vendor/github.com/urfave/cli/v2/app.go:337 +0x5db
github.com/urfave/cli/v2.(*App).Run(...)
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/vendor/github.com/urfave/cli/v2/app.go:311
main.main()
/home/builder/rpmbuild/BUILD/bottlerocket-host-ctr-0.0/cmd/host-ctr/main.go:60 +0x3f
bottlerocket/sources/host-ctr/cmd/host-ctr/main.go
Lines 1147 to 1152 in 64049ba
How to reproduce the problem:
- Connect to Bottlerocket node
enter-admin-container
sudo sheltie
host-ctr run --source public.ecr.aws/bottlerocket/bottlerocket-control:v0.7.12 --container-id test
Thanks for the report (and thanks for the very clear reproduction instructions, in particular).
Initial triage says:
- Yes, this reproduces as advertised on our latest release. Not a big surprise, since this code hasn't changed recently, but worth noting.
- We may not have encountered this earlier because the default URL for this container (at least on my aws-eks variant node) points to a private repository rather than public.ecr.aws.
- Given the code that is failing here, there's a clear expectation that this should work, and at the very least, not segfault.
The segfault occurs because the caller has passed a null registryConfig pointer to the victim withDynamicResolver
function. The solution seems simple enough (i.e., don't dereference the null pointer). Thanks again for the report.
A little more context: the host-ctr executable is invoked by systemd services (see the boot-containers@ and host-containers@ services in package/os
). In those service files the service supplies the registry-config
option, so host-ctr does not segfault there. If you wish to use host-ctr outside of those services, you can work around this problem by adding --registry-config /dev/null
to your own invocation of host-ctr.
I have verified that settings.host-containers.control.source
can be a public ECR URI. For production, you can set this via user data on your worker instances.
Awesome! Glad to hear this got you unblocked. I'll resolve this issue then.
Awesome! Glad to hear this got you unblocked. I'll resolve this issue then.
I'm not sure if resolving the issue would be the right approach, even though panic in the CLI can be worked around it has to be fixed in the long term.
I'll reopen this then to track fixing the original issue on the panic.
I have a fix progressing through the pipeline. I'll keep this issue updated.