[BUG] Manager pod failing to start for arm64 install

Question

[BUG] Manager pod failing to start for arm64 install

carronmedia opened this issue 3 years ago · 19 comments

Describe the bug
I'm installing this package via Helm (and also directly) onto a cluster of Raspberry Pi 4's that use the arm64 architecture, but the manager pod is failing to start with a CrashLoopBackOff error. This normal indicates that the package that is being installed is built for the wrong architecture (i.e. amd64).

To Reproduce
Install the package via Helm.

Expected behavior
The pods should start successfully and I should be able to view the .onion address for the service.

Additional information

As per the conversation on #3, I have uninstalled, updated the repo and reinstalled the package, but the issue still persists.

Here is the failing pod description:

Name:         tor-controller-6977fc959f-hvb48
Namespace:    tor-controller
Priority:     0
Node:        ---
Start Time:   Tue, 01 Mar 2022 15:06:39 +0000
Labels:       app.kubernetes.io/instance=tor-controller
              app.kubernetes.io/name=tor-controller
              pod-template-hash=6977fc959f
Annotations:  <none>
Status:       Running
IP:           10.42.0.15
IPs:
  IP:           10.42.0.15
Controlled By:  ReplicaSet/tor-controller-6977fc959f
Containers:
  manager:
    Container ID:  containerd://c63144efa6f93831c4217b145f9a8669ff3b691f8af16a972dd81bfa4f47d0ee
    Image:         quay.io/bugfest/tor-controller:0.5.0
    Image ID:      quay.io/bugfest/tor-controller@sha256:0f142060bba60d422c6c536de766ace73a0a00535fcffaba354260e54e59c1e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --config=controller_manager_config.yaml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 01 Mar 2022 15:10:06 +0000
      Finished:     Tue, 01 Mar 2022 15:10:06 +0000
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5gzzm (ro)
  kube-rbac-proxy:
    Container ID:  containerd://5eab9e63e587140e040ef3b804ac9bea7f1bdbf8c4d4cb89f09cde93e0811ccb
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Image ID:      gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Tue, 01 Mar 2022 15:06:48 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5gzzm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  manager-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      tor-controller-manager-config
    Optional:  false
  kube-api-access-5gzzm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    4m58s                  default-scheduler  Successfully assigned tor-controller/tor-controller-6977fc959f-hvb48 to ---
  Warning  FailedMount  4m58s                  kubelet            MountVolume.SetUp failed for volume "manager-config" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulled       4m53s                  kubelet            Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 748.656901ms
  Normal   Pulled       4m52s                  kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Normal   Created      4m51s                  kubelet            Created container kube-rbac-proxy
  Normal   Started      4m50s                  kubelet            Started container kube-rbac-proxy
  Normal   Pulled       4m48s                  kubelet            Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 2.019106168s
  Normal   Pulled       4m25s                  kubelet            Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0" in 700.418473ms
  Normal   Created      4m25s (x3 over 4m52s)  kubelet            Created container manager
  Normal   Started      4m25s (x3 over 4m52s)  kubelet            Started container manager
  Warning  BackOff      4m7s (x8 over 4m45s)   kubelet            Back-off restarting failed container
  Normal   Pulling      3m54s (x4 over 4m54s)  kubelet            Pulling image "quay.io/bugfest/tor-controller:0.5.0"

System (please complete the following information):

Platform: Raspberry Pi 4 Kubernetes cluster - arm64
Version: Latest

Answer 1 · 2022-03-01T16:48:01.000Z

Hi @carronmedia, let's see why is failing. So, first of all, the logs. Can you attach them (/tmp/tor-controller-issue-11.txt) here please?

$ kubectl -n tor-controller logs -l app.kubernetes.io/name=tor-controller -c manager | tee /tmp/tor-controller-issue-11.txt

Answer 2 · 2022-03-01T18:04:48.000Z

Thank you for looking! Here you go...

tor-controller-issue-11.txt

Answer 3 · 2022-03-01T18:12:04.000Z

Ok, so wrong architecture xD. Gonna check the 0.5.0 one and re-compile : S

Answer 4 · 2022-03-01T18:21:02.000Z

@carronmedia in the meantime, you might try out one of the older ones, hopefully they have the correct container arch:

$ helm search repo tor-controller --versions
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
bugfest/tor-controller  0.1.3           0.5.0           Tor hidden services controller for kubernetes
bugfest/tor-controller  0.1.2           0.4.0           TOR hidden services controller for kubernetes
bugfest/tor-controller  0.1.1           0.3.2           TOR hidden services controller for kubernetes
bugfest/tor-controller  0.1.0           0.3.1           TOR hidden services controller for kubernetes

$ helm upgrade ... --version 0.1.1

Answer 5 · 2022-03-01T19:31:25.000Z

Hi @bugfest, I've just installed v0.1.1 and unfortunately I'm getting the same issue:

Name:         tor-controller-5bc7b4cb7c-57r84
Namespace:    tor-controller
Priority:     0
Node:         ---
Start Time:   Tue, 01 Mar 2022 19:20:02 +0000
Labels:       app.kubernetes.io/instance=tor-controller
              app.kubernetes.io/name=tor-controller
              pod-template-hash=5bc7b4cb7c
Annotations:  <none>
Status:       Running
IP:           10.42.0.66
IPs:
  IP:           10.42.0.66
Controlled By:  ReplicaSet/tor-controller-5bc7b4cb7c
Containers:
  manager:
    Container ID:  containerd://3c6935bda1914e299a3cba468abca06df90215aa098a5bf5c9e1395f168e0f98
    Image:         quay.io/bugfest/tor-controller:0.3.2
    Image ID:      quay.io/bugfest/tor-controller@sha256:e6ca8c1cb589f780a8f93a8203bb67a1781b7a697bc471f7a50180c97620ec46
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --config=controller_manager_config.yaml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 01 Mar 2022 19:27:50 +0000
      Finished:     Tue, 01 Mar 2022 19:27:50 +0000
    Ready:          False
    Restart Count:  3
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4v54 (ro)
  kube-rbac-proxy:
    Container ID:  containerd://4433e4098a0d4e909cef53f2337149203cf8fd5c88b38c032f42648087b917be
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Image ID:      gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Tue, 01 Mar 2022 19:26:58 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4v54 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  manager-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      tor-controller-manager-config
    Optional:  false
  kube-api-access-j4v54:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  8m14s               default-scheduler  Successfully assigned tor-controller/tor-controller-5bc7b4cb7c-57r84 to ---
  Normal   Pulling    8m10s               kubelet            Pulling image "quay.io/bugfest/tor-controller:0.3.2"
  Normal   Pulled     86s                 kubelet            Successfully pulled image "quay.io/bugfest/tor-controller:0.3.2" in 6m43.446730978s
  Normal   Pulled     80s                 kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Normal   Created    79s                 kubelet            Created container kube-rbac-proxy
  Normal   Started    79s                 kubelet            Started container kube-rbac-proxy
  Normal   Pulled     28s (x3 over 78s)   kubelet            Container image "quay.io/bugfest/tor-controller:0.3.2" already present on machine
  Normal   Created    28s (x4 over 80s)   kubelet            Created container manager
  Normal   Started    27s (x4 over 80s)   kubelet            Started container manager
  Warning  BackOff    15s (x10 over 75s)  kubelet            Back-off restarting failed container

Answer 6 · 2022-03-02T13:05:54.000Z

Hi @carronmedia, I don't own myself an ARM64 device but I've managed to reproduce the issue using a visrtualized one using QEMU. Somehow the containers tagged as arm64 are shipping x64-64 binaries; Going to revisit the CI actions/logs to see If I can spot the issue.

Answer 7 · 2022-03-02T20:04:22.000Z

OK thank you very much, please let me know if I can help out by testing anything. Cheers

Answer 8 · 2022-03-02T20:09:12.000Z

Hi @carronmedia, just pushed a new set of images tagged as 0.5.0-bug11 to (really) support arm64. I checked they don't generate the exec format error anymore. Can you double check in your env?

helm upgrade --install \
        --set image.tag=0.5.0-bug11 \
        --set manager.image.tag=0.5.0-bug11 \
        --set onionbalance.image.tag=0.5.0-bug11 \
        tor-controller bugfest/tor-controller

Answer 9 · 2022-03-02T20:13:28.000Z

latest (current: 0aec224885f57aa4f36d0e2aa2524c24b237c6f5545bb431bd29f016a868adea):

localhost:~# uname -m
aarch64

localhost:~# podman run --rm -ti quay.io/bugfest/tor-controller:latest --help
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
WARN[0006] Failed to add conmon to cgroupfs sandbox cgroup: open /sys/fs/cgroup/cpuset/libpod_parent/conmon/cpuset.cpus: open /sys/fs/cgroup/cpuset/libpod_parent/conmon/cpuset.cpus: no such file or directory
{"msg":"exec container process `/manager`: Exec format error","level":"error","time":"2022-03-02T20:11:02.000360830Z"}

0.5.0-bug11

localhost:~# uname -m
aarch64

localhost:~# podman run --rm -ti quay.io/bugfest/tor-controller:0.5.0-bug11 --help
Trying to pull quay.io/bugfest/tor-controller:0.5.0-bug11...
Getting image source signatures
…
…
…
Writing manifest to image destination
Storing signatures
Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Usage of /manager:
  -config string
        The controller will load its initial configuration from this file. Omit this flag to use the default configuration values. Command-line flags override configuration from this file.
  -kubeconfig string
        Paths to a kubeconfig. Only required if out-of-cluster.
  -no-leader-elect
        Disable leader election for controller manager.
  -zap-devel
        Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error) (default true)
  -zap-encoder value
        Zap log encoding (one of 'json' or 'console')
  -zap-log-level value
        Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
  -zap-stacktrace-level value
        Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
  -zap-time-encoding value
        Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.

Answer 10 · 2022-03-02T20:28:43.000Z

Looks like you fixed it! Thank you very much for sorting this out!

(one thing I did notice is that it was installed in the default namespace, I don't know if that is relevant to you?)

Here is the running pod:

Name:         tor-controller-f47cb9c88-vcklt
Namespace:    default
Priority:     0
Node:         ---
Start Time:   Wed, 02 Mar 2022 20:13:58 +0000
Labels:       app.kubernetes.io/instance=tor-controller
              app.kubernetes.io/name=tor-controller
              pod-template-hash=f47cb9c88
Annotations:  <none>
Status:       Running
IP:           10.42.0.67
IPs:
  IP:           10.42.0.67
Controlled By:  ReplicaSet/tor-controller-f47cb9c88
Containers:
  manager:
    Container ID:  containerd://142fdca715801cbc7ee08f6f600e375dac83afc197d8dd56da75c7061688077f
    Image:         quay.io/bugfest/tor-controller:0.5.0-bug11
    Image ID:      quay.io/bugfest/tor-controller@sha256:ba7396190bcdda32fe5829bd327fb514dae8123b5a050e01b3bf81069c8e81f1
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --config=controller_manager_config.yaml
    State:          Running
      Started:      Wed, 02 Mar 2022 20:18:08 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pqdld (ro)
  kube-rbac-proxy:
    Container ID:  containerd://950b328daf0ae090c53d3c309aecb10b5a72c0ee5833edb8b401a00590d7fb75
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Image ID:      gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Wed, 02 Mar 2022 20:18:18 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pqdld (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  manager-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      tor-controller-manager-config
    Optional:  false
  kube-api-access-pqdld:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age    From               Message
  ----     ------       ----   ----               -------
  Normal   Scheduled    5m50s  default-scheduler  Successfully assigned default/tor-controller-f47cb9c88-vcklt to ---
  Warning  FailedMount  5m49s  kubelet            MountVolume.SetUp failed for volume "manager-config" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling      5m44s  kubelet            Pulling image "quay.io/bugfest/tor-controller:0.5.0-bug11"
  Normal   Pulled       104s   kubelet            Successfully pulled image "quay.io/bugfest/tor-controller:0.5.0-bug11" in 4m0.821158374s
  Normal   Created      102s   kubelet            Created container manager
  Normal   Started      101s   kubelet            Started container manager
  Normal   Pulled       101s   kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Normal   Created      96s    kubelet            Created container kube-rbac-proxy
  Normal   Started      90s    kubelet            Started container kube-rbac-proxy
  Warning  Unhealthy    90s    kubelet            Readiness probe failed: Get "http://10.42.0.67:8081/readyz": dial tcp 10.42.0.67:8081: connect: connection refused

However, It does say that the Readiness probe failed, and when I run kubectl get onion, there is no HOSTNAME for my example service.

Answer 11 · 2022-03-02T20:33:46.000Z

Great :D
With re the namespace: It's ok. The instructions I posted in my last comment didn't include the --namespace option.

With re the onion hostname: It might take some seconds to get updated. The tor container needs to boostrap. Can you check again in a minute or two? If it's still empty, attach the logs from the tor-daemon-manager pod containers

Thanks for using my project btw ^^

Answer 12 · 2022-03-02T21:02:23.000Z

Thank you very much for releasing and the support! It's very much appreciated!

Here is the log from the manager container:

Error from server (BadRequest): previous terminated container "manager" in pod "tor-controller-f47cb9c88-vcklt" not found

Answer 13 · 2022-03-02T21:05:56.000Z

Can you make sure you don't have two tor-controllers installed?

Answer 14 · 2022-03-02T21:17:36.000Z

Yeah it looks like I have just the one tor-controller. When I run kubectl get all --all-namespaces, I can see just the one tor-controller service and pod.

Answer 15 · 2022-03-02T21:24:51.000Z

Ok. I just pushed the new images. Let's start fresh. Please uninstall your current chart and install again. Then, check if the manager container is restarting.

$ kubectl -n tor-controller logs -l app.kubernetes.io/name=tor-controller -c manager --prefix --since 3h

Answer 16 · 2022-03-03T18:54:54.000Z

Hi @carronmedia. I've managed to install tor-controller in my virtual arm64. I see the onion Hostname being popullated correctly. The only issue I see is that the echoserver image does not support arm64 so its in a crashloop (opened issue #12 to get it fixed). Going to find an alternative or re-build it on my own.

localhost:~# uname -m
aarch64

localhost:~# kubectl get onion
NAME                    HOSTNAME                                                         AGE
example-onion-service   5impbmzytv5wwdb6wwtkv6c7um4sk6vogruyjpsm3cbhsgjopy67wqyd.onion   21m

Answer 17 · 2022-03-03T21:30:14.000Z

Hi @bugfest, thank you so much for all of your help, it's now up and running! I had to uninstalled and reinstall a couple of times, but the second time did it :)

NAMESPACE   NAME            HOSTNAME                                                         AGE
default     onion-service   x25momap6jfcf22xkjnzf63vmgu3zukfcjlufxjnnpj65fbhnaje57ad.onion   42m

Thanks again for this project and all of your help over the last couple of days!

Answer 18 · 2022-03-03T21:33:17.000Z

Thanks @carronmedia, you're welcome!

Answer 19 · 2022-03-04T22:03:22.000Z

re: #11 (comment), echoserver image fixed in #12