`kubectl kots install <app--channel>` fails with default app and unstable channel
camilamacedo86 opened this issue · 4 comments
Description
Unable to run kubectl kots install <app--channel>
following the getting started guide: https://docs.replicated.com/vendor/tutorial-installing-with-existing-cluster
Note that the app/release was created with the default files only and I am unable to start the admin console:
Enter the namespace to deploy to: dev4devs-unstable
• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
Enter a new password to be used for the Admin Console: ••••••••••
• Waiting for Admin Console to be ready ⠦Error: Failed to deploy: failed to deploy admin console: failed to wait for web: timeout waiting for deployment to become ready. Use the --wait-duration flag to increase timeout.
Environment
- go 1.19.2
- kind v0.16.0
- k8s 1.25
Hello @camilamacedo86! This seems to indicate that some of the admin console resources failed to become ready during the installation. Performing a kubectl describe
on any of those resources may point to a root cause (feel free to share those outputs here).
Also, we have a Support Bundle utility that will collect this kind of information from the cluster automatically. Here's a link to our docs for steps on how to generate one: https://docs.replicated.com/enterprise/troubleshooting-an-app#generating-a-bundle-using-the-cli. If you do generate one, you can provide the generated .tar.gz
archive here as well.
Hi @cbodonnell,
Thank you a lot for your time and attention 🙏 .
Note that I am creating a new app, a new release without any change just to test it out
The error is Warning FailedScheduling 3m9s default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
and I have only 1 node, see:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane 11m v1.25.2
My guess is that it is failing because of:
Lines 230 to 241 in b43fa9a
See that I have the label node-role.kubernetes.io/control-plane
on the node and if I spin the kind with 3 nodes I will face: Warning FailedScheduling 71s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
TL'DR: Following the outputs for your conference:
$ kubectl describe node/kind-control-plane
Name: kind-control-plane
Roles: control-plane
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
kubernetes.io/arch=arm64
kubernetes.io/hostname=kind-control-plane
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 26 Oct 2022 18:38:40 +0100
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: kind-control-plane
AcquireTime: <unset>
RenewTime: Wed, 26 Oct 2022 18:54:30 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 26 Oct 2022 18:50:26 +0100 Wed, 26 Oct 2022 18:38:37 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 26 Oct 2022 18:50:26 +0100 Wed, 26 Oct 2022 18:38:37 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 26 Oct 2022 18:50:26 +0100 Wed, 26 Oct 2022 18:38:37 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 26 Oct 2022 18:50:26 +0100 Wed, 26 Oct 2022 18:39:03 +0100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 172.18.0.2
Hostname: kind-control-plane
Capacity:
cpu: 5
ephemeral-storage: 263899620Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 12249368Ki
pods: 110
Allocatable:
cpu: 5
ephemeral-storage: 263899620Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 12249368Ki
pods: 110
System Info:
Machine ID: 578f01a3d92a440cb41ce30c8209912c
System UUID: 578f01a3d92a440cb41ce30c8209912c
Boot ID: 42edd258-91f0-4309-adac-b07363ee04fc
Kernel Version: 5.15.49-linuxkit
OS Image: Ubuntu 22.04.1 LTS
Operating System: linux
Architecture: arm64
Container Runtime Version: containerd://1.6.8
Kubelet Version: v1.25.2
Kube-Proxy Version: v1.25.2
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
ProviderID: kind://docker/kind/kind-control-plane
Non-terminated Pods: (11 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-565d847f94-dknfs 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 15m
kube-system coredns-565d847f94-k87xs 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 15m
kube-system etcd-kind-control-plane 100m (2%) 0 (0%) 100Mi (0%) 0 (0%) 15m
kube-system kindnet-d5r2m 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 15m
kube-system kube-apiserver-kind-control-plane 250m (5%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-controller-manager-kind-control-plane 200m (4%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-proxy-jxv2h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-scheduler-kind-control-plane 100m (2%) 0 (0%) 0 (0%) 0 (0%) 15m
local-path-storage local-path-provisioner-684f458cdd-9qv5n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15m
test-python kotsadm-minio-0 50m (1%) 100m (2%) 100Mi (0%) 512Mi (4%) 15m
test-python kotsadm-postgres-0 100m (2%) 200m (4%) 100Mi (0%) 200Mi (1%) 15m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1100m (22%) 400m (8%)
memory 490Mi (4%) 1102Mi (9%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
hugepages-32Mi 0 (0%) 0 (0%)
hugepages-64Ki 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 15m kube-proxy
Normal NodeHasSufficientMemory 16m (x5 over 16m) kubelet Node kind-control-plane status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 16m (x5 over 16m) kubelet Node kind-control-plane status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 16m (x4 over 16m) kubelet Node kind-control-plane status is now: NodeHasSufficientPID
Normal Starting 15m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 15m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 15m kubelet Node kind-control-plane status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 15m kubelet Node kind-control-plane status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 15m kubelet Node kind-control-plane status is now: NodeHasSufficientPID
Normal RegisteredNode 15m node-controller Node kind-control-plane event: Registered Node kind-control-plane in Controller
Normal NodeReady 15m kubelet Node kind-control-plane status is now: NodeReady
Following all outputs:
$ kubectl get all -n test-python
NAME READY STATUS RESTARTS AGE
pod/kotsadm-d74669fc9-rpj7r 0/1 Pending 0 2m20s
pod/kotsadm-minio-0 1/1 Running 0 2m46s
pod/kotsadm-postgres-0 1/1 Running 0 2m46s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kotsadm ClusterIP 10.96.227.234 <none> 3000/TCP 2m20s
service/kotsadm-minio ClusterIP 10.96.63.70 <none> 9000/TCP 2m45s
service/kotsadm-postgres ClusterIP 10.96.180.57 <none> 5432/TCP 2m45s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kotsadm 0/1 1 0 2m20s
NAME DESIRED CURRENT READY AGE
replicaset.apps/kotsadm-d74669fc9 1 1 0 2m20s
NAME READY AGE
statefulset.apps/kotsadm-minio 1/1 2m46s
statefulset.apps/kotsadm-postgres 1/1 2m46s
$ kubectl describe pod/kotsadm-d74669fc9-rpj7r -n test-python
Name: kotsadm-d74669fc9-rpj7r
Namespace: test-python
Priority: 0
Service Account: kotsadm
Node: <none>
Labels: app=kotsadm
kots.io/backup=velero
kots.io/kotsadm=true
pod-template-hash=d74669fc9
Annotations: backup.velero.io/backup-volumes: backup
pre.hook.backup.velero.io/command: ["/backup.sh"]
pre.hook.backup.velero.io/timeout: 10m
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/kotsadm-d74669fc9
Init Containers:
schemahero-plan:
Image: kotsadm/kotsadm-migrations:v1.88.0
Port: <none>
Host Port: <none>
Args:
plan
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 50m
memory: 50Mi
Environment:
SCHEMAHERO_DRIVER: postgres
SCHEMAHERO_SPEC_FILE: /tables
SCHEMAHERO_OUT: /migrations/plan.yaml
SCHEMAHERO_URI: <set to the key 'uri' in secret 'kotsadm-postgres'> Optional: false
Mounts:
/migrations from migrations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vvrx9 (ro)
schemahero-apply:
Image: kotsadm/kotsadm-migrations:v1.88.0
Port: <none>
Host Port: <none>
Args:
apply
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 50m
memory: 50Mi
Environment:
SCHEMAHERO_DRIVER: postgres
SCHEMAHERO_DDL: /migrations/plan.yaml
SCHEMAHERO_URI: <set to the key 'uri' in secret 'kotsadm-postgres'> Optional: false
Mounts:
/migrations from migrations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vvrx9 (ro)
restore-db:
Image: kotsadm/kotsadm:v1.88.0
Port: <none>
Host Port: <none>
Command:
/restore-db.sh
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 100m
memory: 100Mi
Environment:
POSTGRES_PASSWORD: <set to the key 'password' in secret 'kotsadm-postgres'> Optional: false
Mounts:
/backup from backup (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vvrx9 (ro)
restore-s3:
Image: kotsadm/kotsadm:v1.88.0
Port: <none>
Host Port: <none>
Command:
/restore-s3.sh
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 100m
memory: 100Mi
Environment:
S3_ENDPOINT: http://kotsadm-minio:9000
S3_BUCKET_NAME: kotsadm
S3_ACCESS_KEY_ID: <set to the key 'accesskey' in secret 'kotsadm-minio'> Optional: false
S3_SECRET_ACCESS_KEY: <set to the key 'secretkey' in secret 'kotsadm-minio'> Optional: false
S3_BUCKET_ENDPOINT: true
Mounts:
/backup from backup (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vvrx9 (ro)
Containers:
kotsadm:
Image: kotsadm/kotsadm:v1.88.0
Port: 3000/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 100m
memory: 100Mi
Readiness: http-get http://:3000/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
SHARED_PASSWORD_BCRYPT: <set to the key 'passwordBcrypt' in secret 'kotsadm-password'> Optional: false
AUTO_CREATE_CLUSTER_TOKEN: <set to the key 'kotsadm-cluster-token' in secret 'kotsadm-cluster-token'> Optional: false
SESSION_KEY: <set to the key 'key' in secret 'kotsadm-session'> Optional: false
POSTGRES_PASSWORD: <set to the key 'password' in secret 'kotsadm-postgres'> Optional: false
POSTGRES_URI: <set to the key 'uri' in secret 'kotsadm-postgres'> Optional: false
POD_NAMESPACE: test-python (v1:metadata.namespace)
POD_OWNER_KIND: deployment
API_ENCRYPTION_KEY: <set to the key 'encryptionKey' in secret 'kotsadm-encryption'> Optional: false
API_ENDPOINT: http://kotsadm.test-python.svc.cluster.local:3000
API_ADVERTISE_ENDPOINT: http://localhost:8800
S3_ENDPOINT: http://kotsadm-minio:9000
S3_BUCKET_NAME: kotsadm
S3_ACCESS_KEY_ID: <set to the key 'accesskey' in secret 'kotsadm-minio'> Optional: false
S3_SECRET_ACCESS_KEY: <set to the key 'secretkey' in secret 'kotsadm-minio'> Optional: false
S3_BUCKET_ENDPOINT: true
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY: kotsadm-postgres,kotsadm-minio,kotsadm-api-node
KOTS_INSTALL_ID: xxxxxxxxxx
Mounts:
/backup from backup (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vvrx9 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
migrations:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
backup:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-vvrx9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m9s default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
See as well the report with troboolshoting (but that does not bring any new/helpful info):
$ cat /Users/camilamacedo/support-bundle-results.txt
Check PASS
Title: Required Kubernetes Version
Message: Your cluster meets the recommended and required versions of Kubernetes
------------
Check PASS
Title: Container Runtime
Message: A supported container runtime is present on all nodes
------------
Check FAIL
Title: Pod test-python/kotsadm-c668dc485-f7vdv status
Message: Status: Pending
------------
Check FAIL
Title: test-python/kotsadm Deployment Status
Message: The deployment test-python/kotsadm has 0/1 replicas
------------
Check FAIL
Title: test-python/kotsadm-c668dc485 ReplicaSet Status
Message: The replicaset test-python/kotsadm-c668dc485 is not ready
------------
Check PASS
Title: Node status check
Message: All nodes are online.
------------
camilamacedo@Camilas-MacBook-Pro ~/tmp $
Hi @camilamacedo86, I believe the following could be the issue (from the describe node
):
System Info:
Machine ID: 578f01a3d92a440cb41ce30c8209912c
System UUID: 578f01a3d92a440cb41ce30c8209912c
Boot ID: 42edd258-91f0-4309-adac-b07363ee04fc
Kernel Version: 5.15.49-linuxkit
OS Image: Ubuntu 22.04.1 LTS
Operating System: linux
Architecture: arm64
The kotsadm deployment has a node affinity applied so that it will only be scheduled on linux
os and will not be scheduled on arm64
architecture (see this spot in the code). If you would like, you can edit the deployment and remove this, but I cannot guarantee that things will work since we don't support this combination at this time.
The support bundle command should have also generated a .tar.gz
archive. This will contain lots of information about the cluster and it's resources that will assist with troubleshooting.
I hope this is helpful!
Hi @cbodonnell,
Thank you for your help to understand the issue. I tried to track the issues/rfes/suggestions in a better way. Please, feel free to check and let me know wdyt and/or how can I help.
About support arm64
It seems that it will not work out by removing the tolerant criteria, see: #896. It shows that the for we are able to support arm64 the images used must also be built for arm64.
Therefore, I open a new issue for it: (RFE) #3360
Also, I am prosing we add the info into the README for now so that we can avoid others facing the same, see: https://github.com/replicatedhq/kots/pull/3362/files
Node Affinity criteria also seem that would not work on kind by default
I believe that after solving that I might fail on another issue as well. See that when I change the kind for 3 nodes I faced the warning FailedScheduling 71s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
. Then, I checked that kind nodes do not have the label node-role.kubernetes.io/master
which I understand would make it also make fail. So, to see if we could also change that to allow kots to work on vendors like the kind I created the issue: #3361
About support bundle command should have also generated a .tar.gz archive
I could find it really thank you. But, in this case, that does seem too much help either. I mean, by knowing that arm64 is not supported then we can know the reason for the problem but it has not a check like "validate cluster platform". I raised this one as one: replicatedhq/troubleshoot#805
Again, thank you a lot for your time and attention.
Closing this one since it seems like we could raise properly issues for each case scenario and project to be better addressed.