GitLab Runner Pod Unable To Assume Role With Annotation
AlexWang-16 opened this issue · 1 comments
The Problem
I'm trying to assign a role to a gitlab runner deployment in EKS. When I look at the specs.templates.metadata.annotations, I can clearly see the IAM role written.
However, when I execute my CICD pipeline using the runner which runs aws sts get-caller-identity
, I get the following error message:
Unable to locate credentials. You can configure credentials by running "aws configure".
Looking at Kube2IAM logs, I see the following warning and error messages:
time="2022-04-03T17:24:31Z" level=warning msg="Using fallback role for IP 172.x.x.x"
time="2022-04-03T17:24:31Z" level=info msg="GET /latest/meta-data/iam/security-credentials/ (200) took 0.026894 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/ req.remote=172.x.x.x res.duration=0.026894 res.status=200
time="2022-04-03T17:24:31Z" level=warning msg="Using fallback role for IP 172.x.x.x"
time="2022-04-03T17:24:31Z" level=error msg="Error assuming role AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ireland-eks-nprxxxxxxxxx/i-xxxxxxxxxx is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/fallback-role\n\tstatus code: 403, request id: c3d3b597-b00f-xxxx-xxxx-xxxxxxxxxx" ns.name=gitlab-runner pod.iam.role="arn:aws:iam::xxxxxxxxxxxx:role/fallback-role" req.method=GET req.path=/latest/meta-data/iam/security-credentials/fallback-role req.remote=172.x.x.x
time="2022-04-03T17:24:31Z" level=info msg="GET /latest/meta-data/iam/security-credentials/fallback-role (500) took 81.617168 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/fallback-role req.remote=172.x.x.x res.duration=81.617168 res.status=500
It looks like Kube2IAM was not able to detect the IAM role specified in the Pod annotations.
Here's how the GitLab Runner's Deployment looks like
apiVersion: v1
kind: Pod
metadata:
annotations:
checksum/configmap: 3918d3199a949a3fe5d21c34097bbb8a1e8625ecd16f7c2e7099daadc064e399
checksum/secrets: adbc4213787fab36c06c9c6dedc9550bf9edd93880030cf888ac4bb33b477a1d
iam.amazonaws.com/role: gitlab-runner-global
kubectl.kubernetes.io/restartedAt: "2022-04-03T15:33:46-04:00"
kubernetes.io/psp: eks.privileged
prometheus.io/port: "9252"
prometheus.io/scrape: "true"
labels:
app: dev-gitlab-runner
chart: gitlab-runner-0.39.0
heritage: Helm
pod-template-hash: 645d46bddd
release: dev
name: dev-gitlab-runner-xxxxxxxxxx-xxxxx
namespace: gitlab-runner
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: dev-gitlab-runner-xxxxxxxxxx
spec:
containers:
- command:
- /usr/bin/dumb-init
- --
- /bin/bash
- /configmaps/entrypoint
env:
- name: CI_SERVER_URL
value: https://code.example.com/
- name: CLONE_URL
- name: RUNNER_EXECUTOR
value: kubernetes
- name: REGISTER_LOCKED
value: "true"
- name: RUNNER_TAG_LIST
value: dev
- name: RUNNER_OUTPUT_LIMIT
value: "4096"
- name: KUBERNETES_IMAGE
value: ubuntu:16.04
- name: KUBERNETES_PRIVILEGED
value: "true"
- name: KUBERNETES_NAMESPACE
value: gitlab-runner
- name: KUBERNETES_POLL_TIMEOUT
value: "180"
- name: KUBERNETES_SERVICE_ACCOUNT
value: gitlab-runner
- name: KUBERNETES_HELPER_IMAGE
value: gitlab/gitlab-runner-helper:x86_64-v13.3.1
- name: KUBERNETES_PULL_POLICY
value: if-not-present
image: gitlab/gitlab-runner:alpine-v14.4.0
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /entrypoint
- unregister
- --all-runners
livenessProbe:
exec:
command:
- /bin/bash
- /configmaps/check-live
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: dev-gitlab-runner
ports:
- containerPort: 9252
name: metrics
protocol: TCP
readinessProbe:
exec:
command:
- /usr/bin/pgrep
- gitlab.*runner
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /secrets
name: runner-secrets
- mountPath: /home/gitlab-runner/.gitlab-runner
name: etc-gitlab-runner
- mountPath: /configmaps
name: configmaps
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: gitlab-runner-token-7pgtp
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: regcred
initContainers:
- command:
- sh
- /configmaps/configure
env:
- name: CI_SERVER_URL
value: https://code.example.com/
- name: CLONE_URL
- name: RUNNER_EXECUTOR
value: kubernetes
- name: REGISTER_LOCKED
value: "true"
- name: RUNNER_TAG_LIST
value: dev
- name: RUNNER_OUTPUT_LIMIT
value: "4096"
- name: KUBERNETES_IMAGE
value: ubuntu:16.04
- name: KUBERNETES_PRIVILEGED
value: "true"
- name: KUBERNETES_NAMESPACE
value: gitlab-runner
- name: KUBERNETES_POLL_TIMEOUT
value: "180"
- name: KUBERNETES_SERVICE_ACCOUNT
value: gitlab-runner
- name: KUBERNETES_HELPER_IMAGE
value: gitlab/gitlab-runner-helper:x86_64-v13.3.1
- name: KUBERNETES_PULL_POLICY
value: if-not-present
image: gitlab/gitlab-runner:alpine-v14.4.0
imagePullPolicy: IfNotPresent
name: configure
resources: {}
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /secrets
name: runner-secrets
- mountPath: /configmaps
name: configmaps
readOnly: true
- mountPath: /init-secrets
name: init-runner-secrets
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: gitlab-runner-token-7pgtp
readOnly: true
nodeName: x.ca-central-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65533
runAsUser: 100
serviceAccount: gitlab-runner
serviceAccountName: gitlab-runner
terminationGracePeriodSeconds: 3600
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir:
medium: Memory
name: runner-secrets
- emptyDir:
medium: Memory
name: etc-gitlab-runner
- name: init-runner-secrets
projected:
defaultMode: 420
sources:
- secret:
items:
- key: runner-registration-token
path: runner-registration-token
- key: runner-token
path: runner-token
name: dev-gitlab-runner
- configMap:
defaultMode: 420
name: dev-gitlab-runner
name: configmaps
- name: gitlab-runner-token-7pgtp
secret:
defaultMode: 420
secretName: gitlab-runner-token-7pgtp
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-04-03T22:43:26Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-04-03T22:43:43Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-04-03T22:43:43Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-04-03T22:43:25Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://e0e81705780fd3367c7b9a3e57d117526c6451e5fa76b774ed9549eff2b971a3
image: gitlab/gitlab-runner:alpine-v14.4.0
imageID: docker-pullable://gitlab/gitlab-runner@sha256:04f07d11d98689aa0908d756fba11755bd95a5726aa65b4d35673db787c73a8e
lastState: {}
name: gitlab-runner
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-04-03T22:43:26Z"
hostIP: 172.x.x.x
initContainerStatuses:
- containerID: docker://b6a50288fa7901d9525d6a989a49bb4711815191512e55b1f48916925071b097
image: gitlab/gitlab-runner:alpine-v14.4.0
imageID: docker-pullable://gitlab/gitlab-runner@sha256:04f07d11d98689aa0908d756fba11755bd95a5726aa65b4d35673db787c73a8e
What I've tried
- Reinstall kube2iam with tag specified as
kube2iam-2.6.0
to ensure the latest release is being utilized - Ensure values.yml for GitLab Runner helm installation contains
podAnnotations:
iam.amazonaws.com/role: gitlab-runner-global
- Ensure the GitLab runner pods that are active in the cluster actually have iam.awsamazon.com/role: gitlab-runner-global specified under
spec.metadata.annotations
. - Change iam.awsamazon.com/role value to the full ARN format instead of just role name
I'm out of ideas. I would appreciate any suggestions to resolve this issue.
My colleague at work has provided me with guidance on the solution to this problem. This solution is specific to GitLab Runner in Kubernetes. Hopefully it will help someone who is also stuck in the same situation.
Solution
When deploying the runner using helm, you need to add a podAnnotations
property iam.amazonaws.com/role
as the sub-property under runners
in the values.yml file.
It should look something like this:
runners:
podAnnotations:
iam.amazonaws.com/role: my-iam-role
Setting iam.amazonaws.com/role
directly under the podAnnotations
provided in values.yml is incorrect because GitLab runner pod is used to checkin with the GitLab server for new jobs to execute and not the actual Pod that will be executing the CICD pipeline. This is done by the executor. By adding podAnnotations
the way specified above, the executor will contain the annotation to obtain the required IAM role.
The steps will be as follows
- Edit values.yml file to fit your requirements and add
podAnnotations
section as specified above. helm repo add gitlab https://charts.gitlab.io
helm install --namespace <NAMESPACE> gitlab-runner -f <CONFIG_VALUES_FILE> gitlab/gitlab-runner
If you are updating an existing installation:
- Edit values.yml file to fit your requirements and add
podAnnotations
section as specified above. helm repo update
helm upgrade --namespace <NAMESPACE> -f <CONFIG_VALUES_FILE> <RELEASE-NAME> gitlab/gitlab-runner