amazon s3 csi driver mount issue EKS cluster 1.28

Question

amazon s3 csi driver mount issue EKS cluster 1.28

Closed this issue 6 months ago · 10 comments

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?

I have deployed https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml but pod is not coming due to mount access denied issue

$ k get pvc s3-claim
NAME       STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
s3-claim   Bound    s3-pv    1Gi        RWX                           3m56s

$ k get pv s3-pv
NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
s3-pv   1Gi        RWX            Retain           Bound    default/s3-claim                           4m22s

What you expected to happen?

s3-csi-node-4plmw                               3/3     Running   0          24h
s3-csi-node-64m7g                               3/3     Running   0          24h
s3-csi-node-br9kn                               3/3     Running   0          21h
s3-csi-node-dldq8                               3/3     Running   0          24h
s3-csi-node-h9hls                               3/3     Running   0          21h

policy

    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : {
          "Federated" : "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
        },
        "Action" : "sts:AssumeRoleWithWebIdentity",
        "Condition" : {
          "StringEquals" : { 
            "${local.eks_oidc_issuer_url}:aud": "sts.amazonaws.com",  
            "${local.eks_oidc_issuer_url}:sub": "system:serviceaccount:kube-system:s3-csi-*"             
          }
        }
      }
    ]
  })
  inline_policy = [{
    name = "s3-csi-mount-inline-policy"
    policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "MountpointFullBucketAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::palani-test-bucket"
                ]
            },
            {
                "Sid": "MountpointFullObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:ListObject",
                    "s3:PutObject",
                    "s3:AbortMultipartUpload",
                    "s3:DeleteObject",
                    "kms:Encrypt",
                    "kms:Decrypt",
                    "kms:ReEncrypt*",
                    "kms:GenerateDataKey*",
                    "kms:DescribeKey",
                    "kms:CreateGrant",
                    "kms:ListGrants",
                    "kms:RevokeGrant"
                    
                ],
                "Resource": [
                    # "arn:aws:s3:::palani-test-bucket/",
                    "arn:aws:s3:::palani-test-bucket/*"
                ]
            }
        ]
    })

How to reproduce it (as minimally and precisely as possible)?

 Warning  FailedScheduling  38s               default-scheduler  0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
  Normal   Nominated         37s               karpenter          Pod should schedule on: machine/default-ng8p4, node/ip-10-2-3-4.us-east-2.compute.internal
  Normal   Scheduled         26s               default-scheduler  Successfully assigned default/s3-app to ip-10-1-2-3.us-east-2.compute.internal
  Warning  FailedMount       9s (x6 over 26s)  kubelet            MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "palani-test-bucket" at "/var/lib/kubelet/pods/73ed87c0-1450-4716-9a1a-619dc8edc42e/volumes/kubernetes.io~csi/s3-pv/mount": Mount failed: Failed to start service output: Error: Failed to create S3 client  Caused by:     0: initial ListObjectsV2 failed for bucket palani-test-bucket in region us-east-2     1: Client error     2: Forbidden: Access Denied Error: Failed to create mount process

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version):
Driver version: helm.sh/chart=aws-mountpoint-s3-csi-driver-1.5.1

Answer 1 · 2024-04-25T09:00:40.000Z

This seems to be a problem where credentials aren't properly setup. Can you try the following:

Ensure OIDC is enabled on the cluster. This command should produce output: aws iam list-open-id-connect-providers | grep $(aws eks describe-cluster --name $MY_CLUSTER --query "cluster.identity.oidc.issuer" --output text|sed 's/.*\///')
Check your service account is annotated properly kubectl describe sa s3-csi-driver-sa -n YOUR_NAMESPACE. It should have an annotation like this: Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/s3-csi-driver-role
Ensure the proper trust relationship is in that role. It should look something like this:

{  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME",
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}

These steps are from this knowledge base which has some more details: https://repost.aws/knowledge-center/eks-troubleshoot-oidc-and-irsa.

Also the documentation for IRSA might be helpful: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

Answer 2 · 2024-04-27T08:38:02.000Z

@tppalani I solved the same issue like this:
#164 (comment)

don't forget to bound IAM role to s3 csi driver addon

Answer 3 · 2024-04-29T09:49:34.000Z

@tppalani I solved the same issue like this: #164 (comment)

Yes, it does look like the same issue!

It looks like the step to replace StringEquals with StringLike was missed. It should look like this:

{
    "StringLike": {
        "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:kube-system:s3-csi-*",
        "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
    }
}

I'll follow up with the folks owning the S3 User Guide to see if we can make that clearer for readers. (internal ref: d168967d-e615-4727-85fd-56028903ccd7)

Answer 4 · 2024-04-30T12:17:00.000Z

@tppalani, does changing the StringEquals condition to StringLike solve your issue?

Let us know if you have any further issues and we can provide some more help here.

Answer 5 · 2024-05-07T10:21:43.000Z

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.

Ask:

Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

Answer 6 · 2024-05-14T13:32:26.000Z

Closing this issue. @tppalani, please reopen if the suggestion above did not work for you.

Answer 7 · 2024-06-13T15:27:31.000Z

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.

Ask:

Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

What steps did you take to fix this?

Answer 8 · 2024-06-13T18:49:53.000Z

Hi @peterbosalliandercom,

Thanks for the follow-up. I just added AWS EFS CSI driver Add-on additionally, nothing more than that and then deployed the application as normal.

Answer 9 · 2024-07-31T09:00:38.000Z

Adding the EFS CSI driver shouldn't have an impact here. Mountpoint is failing to mount due to an IAM permissions issue in the original issue.

If you're unable to use Mountpoint where installing EFS CSI driver is the way to fix it, please do open a new issue and we can investigate. There is some other issue there.

Answer 10 · 2024-08-08T04:48:05.000Z

I did some testing based on a Customer requirement who want to use S3 and EFS CSI Driver parallel in the same Cluster. The below Consideration seems invalid. Document
"To get PersistentVolume mounted while using the Mountpoint for Amazon S3 CSI driver, we require that the Amazon EFS CSI driver won't be provisioned."

EKS Version: 1.29
S3-CSI-Driver: v1.7.0
EFS-CSI-Driver: v2.0.6

Only S3-CSI Driver Installed in the Cluster: POD is running fine.
S3+EFS CSI both Driver Installed in the Cluster: Both POD using S3 and EFS running Fine.
S3+EFS CSI Driver both Driver Installed in the Cluster: Both POD using S3 and EFS running Fine in the Same WorkerNode as well.

I don't see any relation between S3 and EFS CSI Driver.