jtblin/kube2iam

GitLab Runner Pod Unable To Assume Role With Annotation

AlexWang-16 opened this issue · 1 comments

The Problem

I'm trying to assign a role to a gitlab runner deployment in EKS. When I look at the specs.templates.metadata.annotations, I can clearly see the IAM role written.

However, when I execute my CICD pipeline using the runner which runs aws sts get-caller-identity, I get the following error message:

Unable to locate credentials. You can configure credentials by running "aws configure".

Looking at Kube2IAM logs, I see the following warning and error messages:

time="2022-04-03T17:24:31Z" level=warning msg="Using fallback role for IP 172.x.x.x"
time="2022-04-03T17:24:31Z" level=info msg="GET /latest/meta-data/iam/security-credentials/ (200) took 0.026894 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/ req.remote=172.x.x.x res.duration=0.026894 res.status=200
time="2022-04-03T17:24:31Z" level=warning msg="Using fallback role for IP 172.x.x.x"
time="2022-04-03T17:24:31Z" level=error msg="Error assuming role AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ireland-eks-nprxxxxxxxxx/i-xxxxxxxxxx is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/fallback-role\n\tstatus code: 403, request id: c3d3b597-b00f-xxxx-xxxx-xxxxxxxxxx" ns.name=gitlab-runner pod.iam.role="arn:aws:iam::xxxxxxxxxxxx:role/fallback-role" req.method=GET req.path=/latest/meta-data/iam/security-credentials/fallback-role req.remote=172.x.x.x
time="2022-04-03T17:24:31Z" level=info msg="GET /latest/meta-data/iam/security-credentials/fallback-role (500) took 81.617168 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/fallback-role req.remote=172.x.x.x res.duration=81.617168 res.status=500

It looks like Kube2IAM was not able to detect the IAM role specified in the Pod annotations.

Here's how the GitLab Runner's Deployment looks like

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/configmap: 3918d3199a949a3fe5d21c34097bbb8a1e8625ecd16f7c2e7099daadc064e399
    checksum/secrets: adbc4213787fab36c06c9c6dedc9550bf9edd93880030cf888ac4bb33b477a1d
    iam.amazonaws.com/role: gitlab-runner-global
    kubectl.kubernetes.io/restartedAt: "2022-04-03T15:33:46-04:00"
    kubernetes.io/psp: eks.privileged
    prometheus.io/port: "9252"
    prometheus.io/scrape: "true"
  labels:
    app: dev-gitlab-runner
    chart: gitlab-runner-0.39.0
    heritage: Helm
    pod-template-hash: 645d46bddd
    release: dev
  name: dev-gitlab-runner-xxxxxxxxxx-xxxxx
  namespace: gitlab-runner
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: dev-gitlab-runner-xxxxxxxxxx
spec:
  containers:
  - command:
    - /usr/bin/dumb-init
    - --
    - /bin/bash
    - /configmaps/entrypoint
    env:
    - name: CI_SERVER_URL
      value: https://code.example.com/
    - name: CLONE_URL
    - name: RUNNER_EXECUTOR
      value: kubernetes
    - name: REGISTER_LOCKED
      value: "true"
    - name: RUNNER_TAG_LIST
      value: dev
    - name: RUNNER_OUTPUT_LIMIT
      value: "4096"
    - name: KUBERNETES_IMAGE
      value: ubuntu:16.04
    - name: KUBERNETES_PRIVILEGED
      value: "true"
    - name: KUBERNETES_NAMESPACE
      value: gitlab-runner
    - name: KUBERNETES_POLL_TIMEOUT
      value: "180"
    - name: KUBERNETES_SERVICE_ACCOUNT
      value: gitlab-runner
    - name: KUBERNETES_HELPER_IMAGE
      value: gitlab/gitlab-runner-helper:x86_64-v13.3.1
    - name: KUBERNETES_PULL_POLICY
      value: if-not-present
    image: gitlab/gitlab-runner:alpine-v14.4.0
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /entrypoint
          - unregister
          - --all-runners
    livenessProbe:
      exec:
        command:
        - /bin/bash
        - /configmaps/check-live
      failureThreshold: 3
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: dev-gitlab-runner
    ports:
    - containerPort: 9252
      name: metrics
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - /usr/bin/pgrep
        - gitlab.*runner
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /secrets
      name: runner-secrets
    - mountPath: /home/gitlab-runner/.gitlab-runner
      name: etc-gitlab-runner
    - mountPath: /configmaps
      name: configmaps
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: gitlab-runner-token-7pgtp
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: regcred
  initContainers:
  - command:
    - sh
    - /configmaps/configure
    env:
    - name: CI_SERVER_URL
      value: https://code.example.com/
    - name: CLONE_URL
    - name: RUNNER_EXECUTOR
      value: kubernetes
    - name: REGISTER_LOCKED
      value: "true"
    - name: RUNNER_TAG_LIST
      value: dev
    - name: RUNNER_OUTPUT_LIMIT
      value: "4096"
    - name: KUBERNETES_IMAGE
      value: ubuntu:16.04
    - name: KUBERNETES_PRIVILEGED
      value: "true"
    - name: KUBERNETES_NAMESPACE
      value: gitlab-runner
    - name: KUBERNETES_POLL_TIMEOUT
      value: "180"
    - name: KUBERNETES_SERVICE_ACCOUNT
      value: gitlab-runner
    - name: KUBERNETES_HELPER_IMAGE
      value: gitlab/gitlab-runner-helper:x86_64-v13.3.1
    - name: KUBERNETES_PULL_POLICY
      value: if-not-present
    image: gitlab/gitlab-runner:alpine-v14.4.0
    imagePullPolicy: IfNotPresent
    name: configure
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /secrets
      name: runner-secrets
    - mountPath: /configmaps
      name: configmaps
      readOnly: true
    - mountPath: /init-secrets
      name: init-runner-secrets
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: gitlab-runner-token-7pgtp
      readOnly: true
  nodeName: x.ca-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 65533
    runAsUser: 100
  serviceAccount: gitlab-runner
  serviceAccountName: gitlab-runner
  terminationGracePeriodSeconds: 3600
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir:
      medium: Memory
    name: runner-secrets
  - emptyDir:
      medium: Memory
    name: etc-gitlab-runner
  - name: init-runner-secrets
    projected:
      defaultMode: 420
      sources:
      - secret:
          items:
          - key: runner-registration-token
            path: runner-registration-token
          - key: runner-token
            path: runner-token
          name: dev-gitlab-runner
  - configMap:
      defaultMode: 420
      name: dev-gitlab-runner
    name: configmaps
  - name: gitlab-runner-token-7pgtp
    secret:
      defaultMode: 420
      secretName: gitlab-runner-token-7pgtp
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-04-03T22:43:26Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-04-03T22:43:43Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-04-03T22:43:43Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-04-03T22:43:25Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://e0e81705780fd3367c7b9a3e57d117526c6451e5fa76b774ed9549eff2b971a3
    image: gitlab/gitlab-runner:alpine-v14.4.0
    imageID: docker-pullable://gitlab/gitlab-runner@sha256:04f07d11d98689aa0908d756fba11755bd95a5726aa65b4d35673db787c73a8e
    lastState: {}
    name: gitlab-runner
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-04-03T22:43:26Z"
  hostIP: 172.x.x.x
  initContainerStatuses:
  - containerID: docker://b6a50288fa7901d9525d6a989a49bb4711815191512e55b1f48916925071b097
    image: gitlab/gitlab-runner:alpine-v14.4.0
    imageID: docker-pullable://gitlab/gitlab-runner@sha256:04f07d11d98689aa0908d756fba11755bd95a5726aa65b4d35673db787c73a8e

What I've tried

  • Reinstall kube2iam with tag specified as kube2iam-2.6.0 to ensure the latest release is being utilized
  • Ensure values.yml for GitLab Runner helm installation contains
podAnnotations:
  iam.amazonaws.com/role: gitlab-runner-global
  • Ensure the GitLab runner pods that are active in the cluster actually have iam.awsamazon.com/role: gitlab-runner-global specified under spec.metadata.annotations.
  • Change iam.awsamazon.com/role value to the full ARN format instead of just role name

I'm out of ideas. I would appreciate any suggestions to resolve this issue.

My colleague at work has provided me with guidance on the solution to this problem. This solution is specific to GitLab Runner in Kubernetes. Hopefully it will help someone who is also stuck in the same situation.

Solution

When deploying the runner using helm, you need to add a podAnnotations property iam.amazonaws.com/role as the sub-property under runners in the values.yml file.

It should look something like this:

runners:
    podAnnotations:
        iam.amazonaws.com/role: my-iam-role

Setting iam.amazonaws.com/role directly under the podAnnotations provided in values.yml is incorrect because GitLab runner pod is used to checkin with the GitLab server for new jobs to execute and not the actual Pod that will be executing the CICD pipeline. This is done by the executor. By adding podAnnotations the way specified above, the executor will contain the annotation to obtain the required IAM role.

The steps will be as follows

  1. Edit values.yml file to fit your requirements and add podAnnotations section as specified above.
  2. helm repo add gitlab https://charts.gitlab.io
  3. helm install --namespace <NAMESPACE> gitlab-runner -f <CONFIG_VALUES_FILE> gitlab/gitlab-runner

If you are updating an existing installation:

  1. Edit values.yml file to fit your requirements and add podAnnotations section as specified above.
  2. helm repo update
  3. helm upgrade --namespace <NAMESPACE> -f <CONFIG_VALUES_FILE> <RELEASE-NAME> gitlab/gitlab-runner