[BUG] Manager pod failing to start for arm64 install
carronmedia opened this issue · 19 comments
Describe the bug
I'm installing this package via Helm (and also directly) onto a cluster of Raspberry Pi 4's that use the arm64 architecture, but the manager pod is failing to start with a CrashLoopBackOff error. This normal indicates that the package that is being installed is built for the wrong architecture (i.e. amd64).
To Reproduce
Install the package via Helm.
Expected behavior
The pods should start successfully and I should be able to view the .onion address for the service.
Additional information
As per the conversation on #3, I have uninstalled, updated the repo and reinstalled the package, but the issue still persists.
Here is the failing pod description:
Name: tor-controller-6977fc959f-hvb48
Namespace: tor-controller
Priority: 0
Node: ---
Start Time: Tue, 01 Mar 2022 15:06:39 +0000
Labels: app.kubernetes.io/instance=tor-controller
app.kubernetes.io/name=tor-controller
pod-template-hash=6977fc959f
Annotations: <none>
Status: Running
IP: 10.42.0.15
IPs:
IP: 10.42.0.15
Controlled By: ReplicaSet/tor-controller-6977fc959f
Containers:
manager:
Container ID: containerd://c63144efa6f93831c4217b145f9a8669ff3b691f8af16a972dd81bfa4f47d0ee
Image: quay.io/bugfest/tor-controller:0.5.0
Image ID: quay.io/bugfest/tor-controller@sha256:0f142060bba60d422c6c536de766ace73a0a00535fcffaba354260e54e59c1e6
Port: <none>
Host Port: <none>
Command:
/manager
Args:
--config=controller_manager_config.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 01 Mar 2022 15:10:06 +0000
Finished: Tue, 01 Mar 2022 15:10:06 +0000
Ready: False
Restart Count: 5
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5gzzm (ro)
kube-rbac-proxy:
Container ID: containerd://5eab9e63e587140e040ef3b804ac9bea7f1bdbf8c4d4cb89f09cde93e0811ccb
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
Image ID: gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
State: Running
Started: Tue, 01 Mar 2022 15:06:48 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5gzzm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
manager-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tor-controller-manager-config
Optional: false
kube-api-access-5gzzm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m58s default-scheduler Successfully assigned tor-controller/tor-controller-6977fc959f-hvb48 to ---
Warning FailedMount 4m58s kubelet MountVolume.SetUp failed for volume "manager-config" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulled 4m53s kubelet Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 748.656901ms
Normal Pulled 4m52s kubelet Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
Normal Created 4m51s kubelet Created container kube-rbac-proxy
Normal Started 4m50s kubelet Started container kube-rbac-proxy
Normal Pulled 4m48s kubelet Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 2.019106168s
Normal Pulled 4m25s kubelet Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 700.418473ms
Normal Created 4m25s (x3 over 4m52s) kubelet Created container manager
Normal Started 4m25s (x3 over 4m52s) kubelet Started container manager
Warning BackOff 4m7s (x8 over 4m45s) kubelet Back-off restarting failed container
Normal Pulling 3m54s (x4 over 4m54s) kubelet Pulling image "quay.io/bugfest/tor-controller:0.5.0"
System (please complete the following information):
- Platform: Raspberry Pi 4 Kubernetes cluster - arm64
- Version: Latest
Hi @carronmedia, let's see why is failing. So, first of all, the logs. Can you attach them (/tmp/tor-controller-issue-11.txt
) here please?
$ kubectl -n tor-controller logs -l app.kubernetes.io/name=tor-controller -c manager | tee /tmp/tor-controller-issue-11.txt
Thank you for looking! Here you go...
Ok, so wrong architecture xD. Gonna check the 0.5.0 one and re-compile : S
@carronmedia in the meantime, you might try out one of the older ones, hopefully they have the correct container arch:
$ helm search repo tor-controller --versions
NAME CHART VERSION APP VERSION DESCRIPTION
bugfest/tor-controller 0.1.3 0.5.0 Tor hidden services controller for kubernetes
bugfest/tor-controller 0.1.2 0.4.0 TOR hidden services controller for kubernetes
bugfest/tor-controller 0.1.1 0.3.2 TOR hidden services controller for kubernetes
bugfest/tor-controller 0.1.0 0.3.1 TOR hidden services controller for kubernetes
$ helm upgrade ... --version 0.1.1
Hi @bugfest, I've just installed v0.1.1 and unfortunately I'm getting the same issue:
Name: tor-controller-5bc7b4cb7c-57r84
Namespace: tor-controller
Priority: 0
Node: ---
Start Time: Tue, 01 Mar 2022 19:20:02 +0000
Labels: app.kubernetes.io/instance=tor-controller
app.kubernetes.io/name=tor-controller
pod-template-hash=5bc7b4cb7c
Annotations: <none>
Status: Running
IP: 10.42.0.66
IPs:
IP: 10.42.0.66
Controlled By: ReplicaSet/tor-controller-5bc7b4cb7c
Containers:
manager:
Container ID: containerd://3c6935bda1914e299a3cba468abca06df90215aa098a5bf5c9e1395f168e0f98
Image: quay.io/bugfest/tor-controller:0.3.2
Image ID: quay.io/bugfest/tor-controller@sha256:e6ca8c1cb589f780a8f93a8203bb67a1781b7a697bc471f7a50180c97620ec46
Port: <none>
Host Port: <none>
Command:
/manager
Args:
--config=controller_manager_config.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 01 Mar 2022 19:27:50 +0000
Finished: Tue, 01 Mar 2022 19:27:50 +0000
Ready: False
Restart Count: 3
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4v54 (ro)
kube-rbac-proxy:
Container ID: containerd://4433e4098a0d4e909cef53f2337149203cf8fd5c88b38c032f42648087b917be
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
Image ID: gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
State: Running
Started: Tue, 01 Mar 2022 19:26:58 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4v54 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
manager-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tor-controller-manager-config
Optional: false
kube-api-access-j4v54:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m14s default-scheduler Successfully assigned tor-controller/tor-controller-5bc7b4cb7c-57r84 to ---
Normal Pulling 8m10s kubelet Pulling image "quay.io/bugfest/tor-controller:0.3.2"
Normal Pulled 86s kubelet Successfully pulled image "quay.io/bugfest/tor-controller:0.3.2" in 6m43.446730978s
Normal Pulled 80s kubelet Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
Normal Created 79s kubelet Created container kube-rbac-proxy
Normal Started 79s kubelet Started container kube-rbac-proxy
Normal Pulled 28s (x3 over 78s) kubelet Container image "quay.io/bugfest/tor-controller:0.3.2" already present on machine
Normal Created 28s (x4 over 80s) kubelet Created container manager
Normal Started 27s (x4 over 80s) kubelet Started container manager
Warning BackOff 15s (x10 over 75s) kubelet Back-off restarting failed container
Hi @carronmedia, I don't own myself an ARM64 device but I've managed to reproduce the issue using a visrtualized one using QEMU. Somehow the containers tagged as arm64 are shipping x64-64 binaries; Going to revisit the CI actions/logs to see If I can spot the issue.
OK thank you very much, please let me know if I can help out by testing anything. Cheers
Hi @carronmedia, just pushed a new set of images tagged as 0.5.0-bug11
to (really) support arm64. I checked they don't generate the exec format error
anymore. Can you double check in your env?
helm upgrade --install \
--set image.tag=0.5.0-bug11 \
--set manager.image.tag=0.5.0-bug11 \
--set onionbalance.image.tag=0.5.0-bug11 \
tor-controller bugfest/tor-controller
latest
(current: 0aec224885f57aa4f36d0e2aa2524c24b237c6f5545bb431bd29f016a868adea
):
localhost:~# uname -m
aarch64
localhost:~# podman run --rm -ti quay.io/bugfest/tor-controller:latest --help
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
WARN[0006] Failed to add conmon to cgroupfs sandbox cgroup: open /sys/fs/cgroup/cpuset/libpod_parent/conmon/cpuset.cpus: open /sys/fs/cgroup/cpuset/libpod_parent/conmon/cpuset.cpus: no such file or directory
{"msg":"exec container process `/manager`: Exec format error","level":"error","time":"2022-03-02T20:11:02.000360830Z"}
0.5.0-bug11
localhost:~# uname -m
aarch64
localhost:~# podman run --rm -ti quay.io/bugfest/tor-controller:0.5.0-bug11 --help
Trying to pull quay.io/bugfest/tor-controller:0.5.0-bug11...
Getting image source signatures
…
…
…
Writing manifest to image destination
Storing signatures
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Usage of /manager:
-config string
The controller will load its initial configuration from this file. Omit this flag to use the default configuration values. Command-line flags override configuration from this file.
-kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
-no-leader-elect
Disable leader election for controller manager.
-zap-devel
Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error) (default true)
-zap-encoder value
Zap log encoding (one of 'json' or 'console')
-zap-log-level value
Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
-zap-stacktrace-level value
Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
-zap-time-encoding value
Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.
Looks like you fixed it! Thank you very much for sorting this out!
(one thing I did notice is that it was installed in the default namespace, I don't know if that is relevant to you?)
Here is the running pod:
Name: tor-controller-f47cb9c88-vcklt
Namespace: default
Priority: 0
Node: ---
Start Time: Wed, 02 Mar 2022 20:13:58 +0000
Labels: app.kubernetes.io/instance=tor-controller
app.kubernetes.io/name=tor-controller
pod-template-hash=f47cb9c88
Annotations: <none>
Status: Running
IP: 10.42.0.67
IPs:
IP: 10.42.0.67
Controlled By: ReplicaSet/tor-controller-f47cb9c88
Containers:
manager:
Container ID: containerd://142fdca715801cbc7ee08f6f600e375dac83afc197d8dd56da75c7061688077f
Image: quay.io/bugfest/tor-controller:0.5.0-bug11
Image ID: quay.io/bugfest/tor-controller@sha256:ba7396190bcdda32fe5829bd327fb514dae8123b5a050e01b3bf81069c8e81f1
Port: <none>
Host Port: <none>
Command:
/manager
Args:
--config=controller_manager_config.yaml
State: Running
Started: Wed, 02 Mar 2022 20:18:08 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pqdld (ro)
kube-rbac-proxy:
Container ID: containerd://950b328daf0ae090c53d3c309aecb10b5a72c0ee5833edb8b401a00590d7fb75
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
Image ID: gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
State: Running
Started: Wed, 02 Mar 2022 20:18:18 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pqdld (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
manager-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tor-controller-manager-config
Optional: false
kube-api-access-pqdld:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m50s default-scheduler Successfully assigned default/tor-controller-f47cb9c88-vcklt to ---
Warning FailedMount 5m49s kubelet MountVolume.SetUp failed for volume "manager-config" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulling 5m44s kubelet Pulling image "quay.io/bugfest/tor-controller:0.5.0-bug11"
Normal Pulled 104s kubelet Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0-bug11" in 4m0.821158374s
Normal Created 102s kubelet Created container manager
Normal Started 101s kubelet Started container manager
Normal Pulled 101s kubelet Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
Normal Created 96s kubelet Created container kube-rbac-proxy
Normal Started 90s kubelet Started container kube-rbac-proxy
Warning Unhealthy 90s kubelet Readiness probe failed: Get "http://10.42.0.67:8081/readyz": dial tcp 10.42.0.67:8081: connect: connection refused
However, It does say that the Readiness probe failed, and when I run kubectl get onion
, there is no HOSTNAME for my example service.
Great :D
With re the namespace: It's ok. The instructions I posted in my last comment didn't include the --namespace
option.
With re the onion hostname: It might take some seconds to get updated. The tor
container needs to boostrap. Can you check again in a minute or two? If it's still empty, attach the logs from the tor-daemon-manager
pod containers
Thanks for using my project btw ^^
Thank you very much for releasing and the support! It's very much appreciated!
Here is the log from the manager
container:
Error from server (BadRequest): previous terminated container "manager" in pod "tor-controller-f47cb9c88-vcklt" not found
Can you make sure you don't have two tor-controllers
installed?
Yeah it looks like I have just the one tor-controller. When I run kubectl get all --all-namespaces
, I can see just the one tor-controller
service and pod.
Ok. I just pushed the new images. Let's start fresh. Please uninstall your current chart and install again. Then, check if the manager container is restarting.
$ kubectl -n tor-controller logs -l app.kubernetes.io/name=tor-controller -c manager --prefix --since 3h
Hi @carronmedia. I've managed to install tor-controller in my virtual arm64. I see the onion Hostname being popullated correctly. The only issue I see is that the echoserver
image does not support arm64 so its in a crashloop (opened issue #12 to get it fixed). Going to find an alternative or re-build it on my own.
localhost:~# uname -m
aarch64
localhost:~# kubectl get onion
NAME HOSTNAME AGE
example-onion-service 5impbmzytv5wwdb6wwtkv6c7um4sk6vogruyjpsm3cbhsgjopy67wqyd.onion 21m
Hi @bugfest, thank you so much for all of your help, it's now up and running! I had to uninstalled and reinstall a couple of times, but the second time did it :)
NAMESPACE NAME HOSTNAME AGE
default onion-service x25momap6jfcf22xkjnzf63vmgu3zukfcjlufxjnnpj65fbhnaje57ad.onion 42m
Thanks again for this project and all of your help over the last couple of days!
Thanks @carronmedia, you're welcome!
re: #11 (comment), echoserver image fixed in #12