bentoml/Yatai

Image builder failed due to MountVolume.SetUp failed for volume "yatai-regcred" and "kube-api-access"

tamle511 opened this issue · 4 comments

Hello,
I'm trying to deploy a model to our K8S cluster. I've followed the official installation guide and set up yatai, yatai-image-builder and yatai-deployment successfully.
What I've achieved so far is to create a model and push the model to Yatai with bentoml. But now when trying to create a deployment (using Yatai UI), I've got stuck at the image builder step because the builder pod cannot be created.

Logs from Yatai:

[2023-01-16 16:41:08] [BentoDeployment] [test-onnx] [Reconciling] Starting to reconcile BentoDeployment
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [CheckingImage] Checking image exists: x.x.x.x:5000/yatai-bentos:yatai.test-onnx.0.0.1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [CheckingImage] Image not exists: x.x.x.x:5000/yatai-bentos:yatai.test-onnx.0.0.1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Making sure docker config secret yatai-regcred in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Docker config secret yatai-regcred in namespace yatai is ready
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Generating image builder pod: yatai-bento-image-builder-test-onnx--0-0-1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting bento test-onnx:0.0.1 from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Got bento test-onnx:0.0.1 from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting secret yatai-api-token in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Secret yatai-api-token is found in namespace yatai, so updating it
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Secret yatai-api-token is updated in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting model test-onnx:mj2hs6ut7w6udjex from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] (combined from similar events): Created image builder pod: yatai-bento-image-builder-test-onnx--0-0-1
[2023-01-16 16:41:15] [BentoRequest] [test-onnx--0-0-1] [ReconcileError] Failed to reconcile BentoRequest: image builder pod yatai-bento-image-builder-test-onnx--0-0-1 status is Failed

Pod status:

xxx@xxx:~/bentoml/yatai/helm$ kubectl get po -n yatai
NAME                                             READY   STATUS       RESTARTS   AGE
yatai-bento-image-builder-test-onnx--0-0-1   0/1     Init:Error   0          90s

Describe pod:

Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    21s                default-scheduler  Successfully assigned yatai/yatai-bento-image-builder-test-onnx--0-0-1 to node-01
  Normal   Pulled       20s                kubelet            Container image "quay.io/bentoml/bento-downloader:0.0.1" already present on machine
  Normal   Created      20s                kubelet            Created container bento-downloader
  Normal   Started      20s                kubelet            Started container bento-downloader
  Warning  FailedMount  19s (x2 over 20s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-xdddb" : object "yatai"/"kube-root-ca.crt" not registered
  Warning  FailedMount  19s (x2 over 20s)  kubelet            MountVolume.SetUp failed for volume "yatai-regcred" : object "yatai"/"yatai-regcred" not registered

I've confirmed thatkube-api-access-xdddb and yatai-regcred indeed exist so I am not sure why it says the objects were not registered.

xxx@xxx:~$ kubectl get secret -n yatai
NAME                  TYPE                                  DATA   AGE
default-token-tmczp   kubernetes.io/service-account-token   3      47h
yatai-api-token       Opaque                                1      102m
yatai-regcred         kubernetes.io/dockerconfigjson        1      142m
xxx@xxx:~$ kubectl get cm -n yatai
NAME               DATA   AGE
kube-root-ca.crt   1      47h

Kubernetes version: 1.22.8.

Could somebody please help? Thank you!

Thanks for the report! I found a related issue from the official k8s repo, and your k8s version is within the range of versions for this issue

kubernetes/kubernetes#105204

Thanks @yetone . I'm not sure yet if our k8s version is indeed the issue since I do not have authorities to upgrade our k8s cluster, perhaps I will try it later as our last resort.

Anyway after further debugging I've found the following error in the bento-downloader container:

xxx@xxx:~/bentoml/yatai/helm$ kubectl logs -f -n yatai yatai-bento-image-builder-test-onnx--0-0-1 bento-downloader 
Downloading bento test-onnx:0.0.1 tar file from http://yatai.yatai-system.svc.cluster.local/api/v1/bento_repositories/test-onnx/bentos/0.0.1/download to /tmp/downloaded.tar...
curl: (22) The requested URL returned error: 500

However logs from the yatai pod didn't really show anything related to the error. There were some warnings but I assume they are from some periodical checks. I also tried to call the API from another pod manually but still it didn't trigger any log messages.

xxx@xxx:~$ kubectl logs -f -n yatai-system yatai-6c564d66f5-q44pt yatai --since 5m
INFO[236524] listing unsynced deployments                  cron="sync env"
INFO[236524] updating unsynced deployments syncing_at      cron="sync env"
INFO[236524] updated unsynced deployments syncing_at       cron="sync env"
INFO[236524] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236524] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:23:35.090872       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:23:35.090920       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:23:43.873038       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:23:43.873078       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236576] ws read failed: "websocket: close 1005 (no status)" 
W0117 04:24:22.328051       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:24:22.328089       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:24:22.833320       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:24:22.833370       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
ERRO[236589] ws read failed: "websocket: close 1005 (no status)" 
INFO[236614] listing unsynced deployments                  cron="sync env"
INFO[236614] updating unsynced deployments syncing_at      cron="sync env"
INFO[236614] updated unsynced deployments syncing_at       cron="sync env"
INFO[236614] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236614] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:25:02.089642       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:02.089689       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:25:14.048729       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:14.048766       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:25:49.155607       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:49.155654       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:26:01.329572       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:01.329626       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
INFO[236704] listing unsynced deployments                  cron="sync env"
INFO[236704] updating unsynced deployments syncing_at      cron="sync env"
INFO[236704] updated unsynced deployments syncing_at       cron="sync env"
INFO[236704] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236704] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:26:38.024209       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:38.024250       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:26:38.326888       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:38.326928       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:11.523173       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:11.523229       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:35.613764       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:35.613817       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:42.165695       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:42.165739       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
INFO[236794] listing unsynced deployments                  cron="sync env"
INFO[236794] updating unsynced deployments syncing_at      cron="sync env"
INFO[236794] updated unsynced deployments syncing_at       cron="sync env"
INFO[236794] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236794] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:28:07.787353       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:28:07.787393       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"

Can you check the output of this command?

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: yatai
spec:
  containers:
  - command:
    - sh
    - -c
    - 'curl -H "X-YATAI-API-TOKEN: yatai-image-builder:default:\$(YATAI_API_TOKEN)" "http://yatai.yatai-system.svc.cluster.local/api/v1/bento_repositories/test-onnx/bentos/0.0.1/download"'
    envFrom:
    - secretRef:
        name: yatai-api-token
    image: curlimages/curl
    name: bento-downloader
EOF

sleep 5

kubectl -n yatai logs -f test

Thank you. It turns out the minio endpoint was incorrect so it could not download the bento. I fixed the endpoint, re-pushed the bento and it works now. Not sure why the first push still succeeded even though the endpoint was wrong. Anyway thank you for your support!