amazon s3 csi driver mount issue EKS cluster 1.28
Closed this issue · 10 comments
/kind bug
NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report
What happened?
I have deployed https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml but pod is not coming due to mount access denied issue
$ k get pvc s3-claim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
s3-claim Bound s3-pv 1Gi RWX 3m56s
$ k get pv s3-pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
s3-pv 1Gi RWX Retain Bound default/s3-claim 4m22s
What you expected to happen?
s3-csi-node-4plmw 3/3 Running 0 24h
s3-csi-node-64m7g 3/3 Running 0 24h
s3-csi-node-br9kn 3/3 Running 0 21h
s3-csi-node-dldq8 3/3 Running 0 24h
s3-csi-node-h9hls 3/3 Running 0 21h
policy
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Principal" : {
"Federated" : "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
},
"Action" : "sts:AssumeRoleWithWebIdentity",
"Condition" : {
"StringEquals" : {
"${local.eks_oidc_issuer_url}:aud": "sts.amazonaws.com",
"${local.eks_oidc_issuer_url}:sub": "system:serviceaccount:kube-system:s3-csi-*"
}
}
}
]
})
inline_policy = [{
name = "s3-csi-mount-inline-policy"
policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Sid": "MountpointFullBucketAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::palani-test-bucket"
]
},
{
"Sid": "MountpointFullObjectAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey",
"kms:CreateGrant",
"kms:ListGrants",
"kms:RevokeGrant"
],
"Resource": [
# "arn:aws:s3:::palani-test-bucket/",
"arn:aws:s3:::palani-test-bucket/*"
]
}
]
})
How to reproduce it (as minimally and precisely as possible)?
Warning FailedScheduling 38s default-scheduler 0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
Normal Nominated 37s karpenter Pod should schedule on: machine/default-ng8p4, node/ip-10-2-3-4.us-east-2.compute.internal
Normal Scheduled 26s default-scheduler Successfully assigned default/s3-app to ip-10-1-2-3.us-east-2.compute.internal
Warning FailedMount 9s (x6 over 26s) kubelet MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "palani-test-bucket" at "/var/lib/kubelet/pods/73ed87c0-1450-4716-9a1a-619dc8edc42e/volumes/kubernetes.io~csi/s3-pv/mount": Mount failed: Failed to start service output: Error: Failed to create S3 client Caused by: 0: initial ListObjectsV2 failed for bucket palani-test-bucket in region us-east-2 1: Client error 2: Forbidden: Access Denied Error: Failed to create mount process
Anything else we need to know?:
Environment
- Kubernetes version (use
kubectl version
): - Driver version:
helm.sh/chart=aws-mountpoint-s3-csi-driver-1.5.1
This seems to be a problem where credentials aren't properly setup. Can you try the following:
- Ensure OIDC is enabled on the cluster. This command should produce output:
aws iam list-open-id-connect-providers | grep $(aws eks describe-cluster --name $MY_CLUSTER --query "cluster.identity.oidc.issuer" --output text|sed 's/.*\///')
- Check your service account is annotated properly
kubectl describe sa s3-csi-driver-sa -n YOUR_NAMESPACE
. It should have an annotation like this:Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/s3-csi-driver-role
- Ensure the proper trust relationship is in that role. It should look something like this:
{ "Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME",
"oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
}
}
}
]
}
These steps are from this knowledge base which has some more details: https://repost.aws/knowledge-center/eks-troubleshoot-oidc-and-irsa.
Also the documentation for IRSA might be helpful: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
@tppalani I solved the same issue like this:
#164 (comment)
- don't forget to bound IAM role to s3 csi driver addon
@tppalani I solved the same issue like this: #164 (comment)
Yes, it does look like the same issue!
It looks like the step to replace StringEquals
with StringLike
was missed. It should look like this:
{
"StringLike": {
"oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:kube-system:s3-csi-*",
"oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
}
}
I'll follow up with the folks owning the S3 User Guide to see if we can make that clearer for readers. (internal ref: d168967d-e615-4727-85fd-56028903ccd7
)
@tppalani, does changing the StringEquals
condition to StringLike
solve your issue?
Let us know if you have any further issues and we can provide some more help here.
Hey @dannycjones !
Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:
0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.
Ask:
Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).
Your query:
Does changing the
StringEquals
condition toStringLike
solve your issue?
I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc
Happy to follow-up internally to help customers here!
Closing this issue. @tppalani, please reopen if the suggestion above did not work for you.
Hey @dannycjones !
Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:
0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.
Ask:
Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).
Your query:
Does changing the
StringEquals
condition toStringLike
solve your issue?I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc
Happy to follow-up internally to help customers here!
What steps did you take to fix this?
Thanks for the follow-up. I just added AWS EFS CSI driver Add-on additionally, nothing more than that and then deployed the application as normal.
Adding the EFS CSI driver shouldn't have an impact here. Mountpoint is failing to mount due to an IAM permissions issue in the original issue.
If you're unable to use Mountpoint where installing EFS CSI driver is the way to fix it, please do open a new issue and we can investigate. There is some other issue there.
I did some testing based on a Customer requirement who want to use S3 and EFS CSI Driver parallel in the same Cluster. The below Consideration seems invalid. Document
"To get PersistentVolume mounted while using the Mountpoint for Amazon S3 CSI driver, we require that the Amazon EFS CSI driver won't be provisioned."
EKS Version: 1.29
S3-CSI-Driver: v1.7.0
EFS-CSI-Driver: v2.0.6
Only S3-CSI Driver Installed in the Cluster: POD is running fine.
S3+EFS CSI both Driver Installed in the Cluster: Both POD using S3 and EFS running Fine.
S3+EFS CSI Driver both Driver Installed in the Cluster: Both POD using S3 and EFS running Fine in the Same WorkerNode as well.
I don't see any relation between S3 and EFS CSI Driver.