Provide IAM credentials to containers running inside a kubernetes cluster based on annotations.
Traditionally in AWS, service level isolation is done using IAM roles. IAM roles are attributed through instance profiles and are accessible by services through the transparent usage by the aws-sdk of the ec2 metadata API. When using the aws-sdk, a call is made to the EC2 metadata API which provides temporary credentials that are then used to make calls to the AWS service.
The problem is that in a multi-tenanted containers based world, multiple containers will be sharing the underlying nodes. Given containers will share the same underlying nodes, providing access to AWS resources via IAM roles would mean that one needs to create an IAM role which is a union of all IAM roles. This is not acceptable from a security perspective.
The solution is to redirect the traffic that is going to the ec2 metadata API for docker containers to a container running on each instance, make a call to the AWS API to retrieve temporary credentials and return these to the caller. Other calls will be proxied to the EC2 metadata API. This container will need to run with host networking enabled so that it can call the EC2 metadata API itself.
It is necessary to create an IAM role which can assume other roles and assign it to each kubernetes worker.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"sts:AssumeRole"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
The roles that will be assumed must have a Trust Relationship which allows them to be assumed by the kubernetes worker role. See this StackOverflow post for more details.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/kubernetes-worker-role"
},
"Action": "sts:AssumeRole"
}
]
}
Run the kube2iam container as a daemonset (so that it runs on each worker) with hostNetwork: true
.
The kube2iam daemon and iptables rule (see below) need to run before all other pods that would require
access to AWS resources.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube2iam
labels:
app: kube2iam
spec:
selector:
matchLabels:
name: kube2iam
template:
metadata:
labels:
name: kube2iam
spec:
hostNetwork: true
containers:
- image: jtblin/kube2iam:latest
name: kube2iam
args:
- "--base-role-arn=arn:aws:iam::123456789012:role/"
- "--node=$(NODE_NAME)"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
ports:
- containerPort: 8181
hostPort: 8181
name: http
To prevent containers from directly accessing the EC2 metadata API and gaining unwanted access to AWS resources,
the traffic to 169.254.169.254
must be proxied for docker containers.
iptables \
--append PREROUTING \
--protocol tcp \
--destination 169.254.169.254 \
--dport 80 \
--in-interface docker0 \
--jump DNAT \
--table nat \
--to-destination `curl 169.254.169.254/latest/meta-data/local-ipv4`:8181
This rule can be added automatically by setting --iptables=true
, setting the HOST_IP
environment
variable, and running the container in a privileged security context.
Warning: It is possible that other pods are started on an instance before kube2iam has started. Using --iptables=true
(instead of applying the rule before starting the kubelet) could give those pods the opportunity to access the real EC2 metadata API, assume the role of the EC2 instance and thereby have all permissions the instance role has (including assuming potential other roles). Use with care if you don't trust the users of your kubernetes cluster or if you are running pods (that could be exploited) that have permissions to create other pods (e.g. controllers / operators).
Note that the interface --in-interface
above or using the --host-interface
cli flag may be
different than docker0
depending on which virtual network you use e.g.
- for Calico, use
cali+
(the interface name is something like cali1234567890) - for kops (on kubenet), use
cbr0
- for CNI, use
cni0
- for EKS/amazon-vpc-cni-k8s, even with calico installed uses
eni+
. (Each pod gets an interface likeeni4c0e15dfb05
) - for weave use
weave
- for flannel use
cni0
- for kube-router use
kube-bridge
- for OpenShift use
tun0
- for Cilium use
lxc+
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube2iam
labels:
app: kube2iam
spec:
selector:
matchLabels:
name: kube2iam
template:
metadata:
labels:
name: kube2iam
spec:
hostNetwork: true
containers:
- image: jtblin/kube2iam:latest
name: kube2iam
args:
- "--base-role-arn=arn:aws:iam::123456789012:role/"
- "--iptables=true"
- "--host-ip=$(HOST_IP)"
- "--node=$(NODE_NAME)"
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
ports:
- containerPort: 8181
hostPort: 8181
name: http
securityContext:
privileged: true
Add an iam.amazonaws.com/role
annotation to your pods with the role that you want to assume for this pod.
The optional iam.amazonaws.com/external-id
will allow the use of an ExternalId as part of the assume role
apiVersion: v1
kind: Pod
metadata:
name: aws-cli
labels:
name: aws-cli
annotations:
iam.amazonaws.com/role: role-arn
iam.amazonaws.com/external-id: external-id
spec:
containers:
- image: fstab/aws-cli
command:
- "/home/aws/aws/env/bin/aws"
- "s3"
- "ls"
- "some-bucket"
name: aws-cli
You can use --default-role
to set a fallback role to use when annotation is not set.
When creating higher-level abstractions than pods, you need to pass the annotation in the pod template of the resource spec.
Example for a Deployment
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
template:
metadata:
annotations:
iam.amazonaws.com/role: role-arn
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.9.1
ports:
- containerPort: 80
Example for a CronJob
:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-cronjob
spec:
schedule: "00 11 * * 2"
concurrencyPolicy: Forbid
startingDeadlineSeconds: 3600
jobTemplate:
spec:
template:
metadata:
annotations:
iam.amazonaws.com/role: role-arn
spec:
restartPolicy: OnFailure
containers:
- name: job
image: my-image
By using the flag --namespace-restrictions you can enable a mode in which the roles that pods can assume is restricted by an annotation on the pod's namespace. This annotation should be in the form of a json array.
To allow the aws-cli pod specified above to run in the default namespace your namespace would look like the following.
apiVersion: v1
kind: Namespace
metadata:
annotations:
iam.amazonaws.com/allowed-roles: |
["role-arn"]
name: default
Note: You can also use glob-based matching for namespace restrictions, which works nicely with the path-based namespacing supported for AWS IAM roles.
Example: to allow all roles prefixed with my-custom-path/
to be assumed by pods in the default namespace, the
default namespace would be annotated as follows:
apiVersion: v1
kind: Namespace
metadata:
annotations:
iam.amazonaws.com/allowed-roles: |
["my-custom-path/*"]
name: default
If you prefer regexp
to glob-based matching you can specify --namespace-restriction-format=regexp
, then you can
use a regexp
in your annotation:
apiVersion: v1
kind: Namespace
metadata:
annotations:
iam.amazonaws.com/allowed-roles: |
["my-custom-path/.*"]
name: default
This is the basic RBAC setup to get kube2iam working correctly when your cluster is using rbac. Below is the bare minimum to get kube2iam working.
First we need to make a service account.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube2iam
namespace: kube-system
Next we need to setup roles and binding for the the process.
---
apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube2iam
rules:
- apiGroups: [""]
resources: ["namespaces","pods"]
verbs: ["get","watch","list"]
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube2iam
subjects:
- kind: ServiceAccount
name: kube2iam
namespace: kube-system
roleRef:
kind: ClusterRole
name: kube2iam
apiGroup: rbac.authorization.k8s.io
kind: List
You will notice this lives in the kube-system namespace to allow for easier seperation between system services and other services.
Here is what a kube2iam daemonset yaml might look like.
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube2iam
namespace: kube-system
labels:
app: kube2iam
spec:
selector:
matchLabels:
name: kube2iam
template:
metadata:
labels:
name: kube2iam
spec:
serviceAccountName: kube2iam
hostNetwork: true
containers:
- image: jtblin/kube2iam:latest
imagePullPolicy: Always
name: kube2iam
args:
- "--app-port=8181"
- "--base-role-arn=arn:aws:iam::xxxxxxx:role/"
- "--iptables=true"
- "--host-ip=$(HOST_IP)"
- "--host-interface=weave"
- "--verbose"
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 8181
hostPort: 8181
name: http
securityContext:
privileged: true
To use kube2iam
on OpenShift one needs to configure additional resources.
A complete example for OpenShift 3 looks like this. For OpenShift 4, see the next section.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube2iam
namespace: kube-system
---
apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: kube2iam
rules:
- apiGroups: [""]
resources: ["namespaces","pods"]
verbs: ["get","watch","list"]
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kube2iam
subjects:
- kind: ServiceAccount
name: kube2iam
namespace: kube-system
roleRef:
kind: ClusterRole
name: kube2iam
apiGroup: rbac.authorization.k8s.io
kind: List
---
kind: SecurityContextConstraints
apiVersion: v1
metadata:
name: kube2iam
allowPrivilegedContainer: true
allowHostPorts: true
allowHostNetwork: true
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
users:
- system:serviceacount:kube-system:kube2iam
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube2iam
namespace: kube-system
labels:
app: kube2iam
spec:
selector:
matchLabels:
name: kube2iam
template:
metadata:
labels:
name: kube2iam
spec:
serviceAccountName: kube2iam
hostNetwork: true
nodeSelector:
role: app
containers:
- image: docker.io/jtblin/kube2iam:latest
imagePullPolicy: Always
name: kube2iam
args:
- "--app-port=8181"
- "--auto-discover-base-arn"
- "--iptables=true"
- "--host-ip=$(HOST_IP)"
- "--host-interface=tun0"
- "--verbose"
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 8181
hostPort: 8181
name: http
securityContext:
privileged: true
Note: In (OpenShift) multi-tenancy setups it is recommended to restrict the assumable roles on the namespace level to prevent cross-namespace trust stealing.
To use kube2iam
on OpenShift 4, the additional resources are slightly different from those for OpenShift 3 shown above. OpenShift 4 has hard-coded iptables rules that block connections from containers to the EC2 metadata service 169.254.169.254. The kube2iam
pods already run with host networking enabled, they are not affected by these OpenShift iptables rules.
The OpenShift iptables rules have implications for pods authenticating through kube2iam
though. But let's look at an example for deploying kube2iam
on OpenShift 4 first:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube2iam
namespace: kube-system
---
apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: kube2iam
rules:
- apiGroups: [""]
resources: ["namespaces","pods"]
verbs: ["get","watch","list"]
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kube2iam
subjects:
- kind: ServiceAccount
name: kube2iam
namespace: kube-system
roleRef:
kind: ClusterRole
name: kube2iam
apiGroup: rbac.authorization.k8s.io
kind: List
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube2iam
namespace: kube-system
labels:
app: kube2iam
spec:
selector:
matchLabels:
name: kube2iam
template:
metadata:
labels:
name: kube2iam
spec:
serviceAccountName: kube2iam
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/worker: ''
containers:
- image: docker.io/jtblin/kube2iam:latest
imagePullPolicy: Always
name: kube2iam
args:
- "--app-port=8181"
- "--auto-discover-base-arn"
- "--host-ip=$(HOST_IP)"
- "--host-interface=tun0"
- "--verbose"
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 8181
hostPort: 8181
name: http
Compared to the OpenShift 3 example in the previous section, we removed the kube2iam
SecurityContextConstraint. In the kube2iam
DaemonSet, we changed the nodeSelector to the match OpenShift 4 worker nodes, removed the iptables argument, and removed the privileged
securityContext.
We use the OpenShift hostnetwork
SecurityContextConstraint for kube2iam
:
oc adm policy add-scc-to-user hostnetwork -n kube-system -z kube2iam
For applications, the iptables rule that kube2iam
would create to redirect 169.254.169.254 connections to the kube2iam
pods has no effect because the hard-coded iptables rules block those connections on OpenShift 4.
As a workaround, the environment variables http_proxy and no_proxy can be set to use kube2iam
as a HTTP proxy when accessing the metadata service. Below is an example for the aws-service-operator:
- kind: Deployment
apiVersion: apps/v1beta1
metadata:
name: aws-service-operator
namespace: aws-service-operator
spec:
replicas: 1
template:
metadata:
annotations:
iam.amazonaws.com/role: aws-service-operator
labels:
app: aws-service-operator
spec:
serviceAccountName: aws-service-operator
containers:
- name: aws-service-operator
image: awsserviceoperator/aws-service-operator:v0.0.1-alpha4
imagePullPolicy: Always
command:
- /bin/sh
args:
- "-c"
- export http_proxy=${HOST_IP}:8181; /usr/local/bin/aws-service-operator server --cluster-name=<CLUSTER_NAME> --region=<REGION> --account-id=<ACCOUNT_ID> --k8s-namespace=<K8S_NAMESPACE>
env:
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: no_proxy
value: "*.amazonaws.com,<KUBE_API_IP>:443"
Compared to the Deployment definition from aws-service-operator/configs/aws-service-operator.yaml, this adds the http_proxy and no_proxy environment variables.
Because we use the IP address of the OpenShift node to access the kube2iam
pod, we cannot set http_proxy in the env
list, but use a shell command instead.
The value for the no_proxy environment variable is specific to the application. kube2iam
only allows proxy connections to 169.254.169.254. All other hostnames or IP addresses that the application connects to through HTTP or HTTPS need to be listed in the no_proxy variable.
For example, the aws-service-operator needs access to various AWS APIs and the Kubernetes API. The Kubernetes API listens on the first IP address in the OpenShift service network. If 172.31.0.0/16
is the OpenShift cluster service network, KUBE_API_IP is 172.31.0.1
.
By using the --debug flag you can enable some extra features making debugging easier:
/debug/store
endpoint enabled to dump knowledge of namespaces and role association.
By using the --auto-discover-base-arn
flag, kube2iam will auto discover the base ARN via the EC2 metadata service.
By using the --auto-discover-default-role
flag, kube2iam will auto discover the base ARN and the IAM role attached to
the instance and use it as the fallback role to use when annotation is not set.
STS is a unique service in that it is actually considered a global service that defaults to endpoint at https://sts.amazonaws.com, regardless of your region setting. However, unlike other global services (e.g. CloudFront, IAM), STS also has regional endpoints which can only be explicitly used programatically. The use of a regional sts endpoint can reduce the latency for STS requests.
kube2iam
supports the use of STS regional endpoints by using the --use-regional-sts-endpoint
flag as well as by setting the appropriate AWS_REGION
environment variable in your daemonset environment. With these two settings configured, kube2iam
will use the STS api endpoint for that region. If you enable debug level logging, the sts endpoint used to retrieve credentials will be logged.
kube2iam
exports a number of Prometheus metrics to assist with monitoring
the system's performance. By default, these are exported at the /metrics
HTTP endpoint on the
application server port (specified by --app-port
). This does not always make sense, as anything with access to the
application server port can assume roles via kube2iam
. To mitigate this use the --metrics-port
argument to specify
a different port that will host the /metrics
endpoint.
All of the exported metrics are prefixed with kube2iam_
. See the Prometheus documentation
for more information on how to get up and running with Prometheus.
By default, kube2iam
will use the in-cluster method to connect to the kubernetes master, and use the
iam.amazonaws.com/role
annotation to retrieve the role for the container. Either set the base-role-arn
option to
apply to all roles and only pass the role name in the iam.amazonaws.com/role
annotation, otherwise pass the full role
ARN in the annotation.
$ kube2iam --help
Usage of kube2iam:
--api-server string Endpoint for the api server
--api-token string Token to authenticate with the api server
--app-port string Kube2iam server http port (default "8181")
--auto-discover-base-arn Queries EC2 Metadata to determine the base ARN
--auto-discover-default-role Queries EC2 Metadata to determine the default Iam Role and base ARN, cannot be used with --default-role, overwrites any previous setting for --base-role-arn
--backoff-max-elapsed-time duration Max elapsed time for backoff when querying for role. (default 2s)
--backoff-max-interval duration Max interval for backoff when querying for role. (default 1s)
--base-role-arn string Base role ARN
--iam-role-session-ttl Length of session when assuming the roles (default 15m)
--debug Enable debug features
--default-role string Fallback role to use when annotation is not set
--host-interface string Host interface for proxying AWS metadata (default "docker0")
--host-ip string IP address of host
--iam-role-key string Pod annotation key used to retrieve the IAM role (default "iam.amazonaws.com/role")
--iam-external-id string Pod annotation key used to retrieve the IAM ExternalId (default "iam.amazonaws.com/external-id")
--insecure Kubernetes server should be accessed without verifying the TLS. Testing only
--iptables Add iptables rule (also requires --host-ip)
--log-format string Log format (text/json) (default "text")
--log-level string Log level (default "info")
--metadata-addr string Address for the ec2 metadata (default "169.254.169.254")
--metrics-port string Metrics server http port (default: same as kube2iam server port) (default "8181")
--namespace-key string Namespace annotation key used to retrieve the IAM roles allowed (value in annotation should be json array) (default "iam.amazonaws.com/allowed-roles")
--cache-resync-period Refresh interval for pod and namespace caches
--resolve-duplicate-cache-ips Queries the k8s api server to find the source of truth when the pod cache contains multiple pods with the same IP
--namespace-restriction-format string Namespace Restriction Format (glob/regexp) (default "glob")
--namespace-restrictions Enable namespace restrictions
--node string Name of the node where kube2iam is running
--use-regional-sts-endpoint use the regional sts endpoint if AWS_REGION is set
--verbose Verbose
--version Print the version and exits
- Use minikube to run cluster locally
- Build and push dev image to docker hub:
make docker-dev DOCKER_REPO=<your docker hub username>
- Update
deployment.yaml
as needed - Deploy to local kubernetes cluster:
kubectl create -f deployment.yaml
orkubectl delete -f deployment.yaml && kubectl create -f deployment.yaml
- Expose as service:
kubectl expose deployment kube2iam --type=NodePort
- Retrieve the services url:
minikube service kube2iam --url
- Test your changes e.g.
curl -is $(minikube service kube2iam --url)/healthz
Jerome Touffe-Blin, @jtblin, About me
kube2iam is copyright 2017 Jerome Touffe-Blin and contributors. It is licensed under the BSD license. See the included LICENSE file for details.