Failed on deploying Yatai to EKS

Question

Failed on deploying Yatai to EKS

bobmayuze opened this issue 2 years ago · 5 comments

I followed the documentation and have encountered several issues

Failed on deploying postgreSQL

$ k events pods/postgresql-ha-postgresql-0 -n yatai-system
LAST SEEN              TYPE      REASON                 OBJECT                                                  MESSAGE
35m (x208 over 69m)    Warning   Unhealthy              Pod/yatai-6899664d9c-l2f6l                              Readiness probe failed: Get "http://172.31.89.159:7777/": dial tcp 172.31.89.159:7777: connect: connection refused
10m (x295 over 69m)    Warning   Unhealthy              Pod/yatai-6899664d9c-l2f6l                              Liveness probe failed: Get "http://172.31.89.159:7777/": dial tcp 172.31.89.159:7777: connect: connection refused
9m8s (x11 over 110m)   Warning   FailedScheduling       Pod/postgresql-ha-postgresql-1                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
9m8s (x11 over 110m)   Warning   FailedScheduling       Pod/postgresql-ha-postgresql-2                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
5m3s (x184 over 60m)   Warning   BackOff                Pod/yatai-6899664d9c-l2f6l                              Back-off restarting failed container
3m51s (x9 over 84m)    Warning   FailedScheduling       Pod/postgresql-ha-postgresql-0                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
26s (x482 over 120m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-0   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
26s (x482 over 120m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-1   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
26s (x483 over 120m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-2   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

Then I turned to setup aws rds to keep the process going. yet the final stage on pulling up yatai still failed
2. Failed on pulling up yatai

$ k events pods/yatai-6899664d9c-l2f6l -n yatai-system
LAST SEEN               TYPE      REASON                 OBJECT                                                  MESSAGE
36m (x208 over 71m)     Warning   Unhealthy              Pod/yatai-6899664d9c-l2f6l                              Readiness probe failed: Get "http://172.31.89.159:7777/": dial tcp 172.31.89.159:7777: connect: connection refused
11m (x295 over 71m)     Warning   Unhealthy              Pod/yatai-6899664d9c-l2f6l                              Liveness probe failed: Get "http://172.31.89.159:7777/": dial tcp 172.31.89.159:7777: connect: connection refused
5m22s (x9 over 86m)     Warning   FailedScheduling       Pod/postgresql-ha-postgresql-0                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
117s (x482 over 121m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-0   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
117s (x482 over 121m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-1   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
117s (x483 over 121m)   Normal    ExternalProvisioning   PersistentVolumeClaim/data-postgresql-ha-postgresql-2   waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
86s (x196 over 62m)     Warning   BackOff                Pod/yatai-6899664d9c-l2f6l                              Back-off restarting failed container
28s (x12 over 111m)     Warning   FailedScheduling       Pod/postgresql-ha-postgresql-1                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
28s (x12 over 111m)     Warning   FailedScheduling       Pod/postgresql-ha-postgresql-2                          running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition

Any help how can I keep it going? Thanks

Answer 1 · 2023-05-06T13:56:52.000Z

First, you should check the PVC status:

kubectl -n yatai-system get pvc

If pvc are pending or failed, you should describe the PVC to get the reseaon:

kubectl -n yatai-system describe pvc $pvcName

Maybe you don't have any storageclass in your cluster or you do not have any storageclass provisioner

refs: https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/

Answer 2 · 2023-05-06T14:32:07.000Z

I tried to describe the pvc and found out this

kubectl -n yatai-system get pvc
NAME                              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-postgresql-ha-postgresql-0   Pending                                      gp2            13h
data-postgresql-ha-postgresql-1   Pending                                      gp2            13h
data-postgresql-ha-postgresql-2   Pending                                      gp2            13h

and to get a further detailed description, i did k -n yatai-system describe pvc data-postgresql-ha-postgresql-0 and got this

Name:          data-postgresql-ha-postgresql-0
Namespace:     yatai-system
StorageClass:  gp2
Status:        Pending
Volume:
Labels:        app.kubernetes.io/component=postgresql
               app.kubernetes.io/instance=postgresql-ha
               app.kubernetes.io/name=postgresql-ha
Annotations:   volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: ip-172-31-21-121.ec2.internal
               volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       postgresql-ha-postgresql-0
Events:
  Type    Reason                Age                     From                         Message
  ----    ------                ----                    ----                         -------
  Normal  ExternalProvisioning  3m25s (x3203 over 13h)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

So I went to describe the storage class gp2, and got this

$ k describe storageclass gp2
Name:            gp2
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           kubernetes.io/aws-ebs
Parameters:            fsType=ext4,type=gp2
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>

Any guidance on creating the storage class here? This part was not mentioned in the installation doc

Answer 3 · 2023-05-06T14:56:32.000Z

You should follow this AWS official documentation to setup the CSI driver on EKS and enable OIDC IAM in existing EKS Cluster

https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html
https://stackoverflow.com/a/68725742

Answer 4 · 2023-05-08T04:41:16.000Z

it worked, but I have to re-install yatai after the ebs-csi driver has been configured

Answer 5 · 2023-05-10T05:16:38.000Z

@bobmayuze It is not necessary to reinstall, but to recreate the pvc before it is recognized by volume provisioner