kube2iam and encryption provider don't always play well together
SleepyBrett opened this issue · 7 comments
What happened:
We use github.com/jtblin/kube2iam to proxy calls to the aws metadata service. I've assigned a role to the encryption plugin but if the encryption plugin starts up before kube2iam get's it's iptables role in place to map 169.254.169.254 to itself. If this happens the encryption provider gets node credentials, which do not allow access to kms. The encryption provider then will continuously try to make kms calls (and fail) until I assume the initial credentials expire.
What you expected to happen:
I expect once the first kms call fails that either the encryption provider should crash (causing it to try again) or maybe fall back and try re-acquiring credentials?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I'm really looking for workarounds at the moment. If I start the nodes by blackhole-ing 169.254.169.254 on the pod network bridge until kube2iam starts is potentially an option but I'm not sure yet how your code will respond to that...
Environment:
- Kubernetes version (use
kubectl version
): 1.15.3 - Encryption provider plugin version: v.01
- Cloud provider configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Others:
Since this plugin needs to be up before the API server, wouldn't you have a dependency loop by relying on kube2iam and secrets? Are you not running this as a static pod with host networking?
We don't run anything as a static pod, as a general rule. We've preset the k2i iptables rule on all our nodes so that the encryption plugin, and others, don't accidentally get the node's role if they win the race.
@SleepyBrett If you don't use static pods, you're probably going to have a circular dependency here, as the API server won't be able to encrypt/decrypt secrets before the provider is online, and kube2iam will need to access secrets.
If this happens the encryption provider gets node credentials, which do not allow access to kms.
I see a few options here: you could grant your node KMS access temporarily, and later remove KMS permissions.
If I start the nodes by blackhole-ing 169.254.169.254 on the pod network bridge until kube2iam starts is potentially an option but I'm not sure yet how your code will respond to that...
Can you try this and see how it behaves? This might do what you want
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.