planetscale/vitess-operator

AWS IAM Role for Service Account support

klagroix opened this issue · 2 comments

We're trying to use vitess-operator with an S3 backup spec defined in our VitessCluster manifest. Example as follows:

spec:  
  backup:
    engine: xtrabackup
    locations:
      - s3:
          bucket: my-s3-bucket
          region: us-east-1

From the logs, it appears that vttablet attempts to read the backup S3 bucket to see it it needs to restore from the latest backup. As such, we need to provide AWS credentials to allow the pod access to read the S3 bucket.

Typically, we use IAM Roles for Service Accounts (IRSA) which allows annotating a Service Account with an IAM Role arn: https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html

  • Using IRSA, when a pod is created using this annotated service account, AWS_* environment variables are added to the pod so the pod can authenticate and use the the IAM role.

The issue here is that vitess-operator seems to see these AWS_ environment variables on the pod and attempts to $patch: delete these variables.

Example log taken from vitess-operator:

time="2023-02-14T22:12:16Z" level=info msg="Updating object in place" diff="metadata:\n  annotations:\n    planetscale.com/observed-shard-generation: \"1\"\n    rollout.planetscale.com/scheduled: |\n      spec:\n        containers:\n        - $setElementOrder/env:\n          - name: VTROOT\n          - name: VTDATAROOT\n          - name: VT_MYSQL_ROOT\n          - name: MYSQL_FLAVOR\n          - name: EXTRA_MY_CNF\n          - name: POD_IP\n          env:\n          - $patch: delete\n            name: AWS_DEFAULT_REGION\n          - $patch: delete\n            name: AWS_REGION\n          - $patch: delete\n            name: AWS_ROLE_ARN\n          - $patch: delete\n            name: AWS_STS_REGIONAL_ENDPOINTS\n          - $patch: delete\n            name: AWS_WEB_IDENTITY_TOKEN_FILE\n          name: vttablet\n        - $setElementOrder/env:\n          - name: VTROOT\n          - name: VTDATAROOT\n          - name: VT_MYSQL_ROOT\n          - name: MYSQL_FLAVOR\n          - name: EXTRA_MY_CNF\n          - name: POD_IP\n          env:\n          - $patch: delete\n            name: AWS_DEFAULT_REGION\n          - $patch: delete\n            name: AWS_REGION\n          - $patch: delete\n            name: AWS_ROLE_ARN\n          - $patch: delete\n            name: AWS_STS_REGIONAL_ENDPOINTS\n          - $patch: delete\n            name: AWS_WEB_IDENTITY_TOKEN_FILE\n          name: mysqld\n        - $setElementOrder/env:\n          - name: DATA_SOURCE_NAME\n          env:\n          - $patch: delete\n            name: AWS_DEFAULT_REGION\n          - $patch: delete\n            name: AWS_REGION\n          - $patch: delete\n            name: AWS_ROLE_ARN\n          - $patch: delete\n            name: AWS_STS_REGIONAL_ENDPOINTS\n          - $patch: delete\n            name: AWS_WEB_IDENTITY_TOKEN_FILE\n          name: mysqld-exporter\n        initContainers:\n        - env: null\n          name: init-vt-root\n        - env: null\n          name: init-mysql-socket\n" gvk="/v1, Kind=Pod" key=dev/dev-vitess-vttablet-useast1a-OMITTED

Is there any way to force vitess-operator to ignore the AWS_* environment variable discrepancies?

For context, these tests were performed on an AWS-hosted EKS cluster using vitess-operator v2.9.0-rc1

How are you adding the environment variables? Are you adding them to the pod directly?
Did you try adding these environment variables to the tabletPools config as extraEnv? You can specify extra environnment variables there and these are added to all the mysqld and vttablet pods.

Hello, when using AWS EKS and IAM Roles for Service Accounts (IRSA), service accounts are annotated with an IAM role ARN.

EKS automatically injects AWS_* environment variables into pods that are using the service account.

As these AWS_* values are dynamic, I cannot add these as an extraEnv.

I did try add fake/placeholder values for these environment variables in extraEnv however vitess-operator still saw a difference and attempted to re-create the pods.

It would be nice if we could tell vitess-operator to ignore specific environment variables from the diff.