uswitch/kiam

Docs update: make it clear agent and server can not run on same node

robvadai opened this issue · 3 comments

Overview

Spent a number of days banging my head around kiam.

Finally got it working after digging deep in comments like this one: #95 (comment)

I don't know if it's just me but it wasn't obvious to me from the beginning I need to run agents and servers on entirely separate Kubernetes nodes.

The fact I was using a single node Rancher cluster for testing didn't help.

Once I used nodeSelector labels to make sure my agents and servers run on separate nodes, and the pods that need AWS access are on the same nodes with agents, kiam started working.

It would've saved me pulling my hair out and days (over a few weeks though) of work if the documentation emphasised on this more.

Suggestion

The main README should state clearly and obviously that agents and servers can not work on the same nodes due to iptables rules.

The Helm chart should enforce mutually exclusive nodeSelector and/or taint/affinity settings for agents and servers. Kind of make it idiot proof that you can never ever run these pods on the same nodes.

Also then it would become obvious kiam will never work on single node Kubernetes clusters.

Include with the additions, though, that this will only affect you if you're using the --iptables flag (not default).

@gladiatr72 so you're saying if I disable iptables same node should work...

Yes. The system was designed for the kube api server hosting the kiam-server pod. The agent iptables rules are to intercept authn/z related requests and either fulfill the token request or return an empty response. They do not take into account the presence of a kiam-server proc on the same node. As soon as the agent installs the rules the server looses its ability to communicate with the real metadata endpoint. There are ways of accounting for this but since it involves out of spec deployment, you're probably wouldn't have gotten much support for your cause.

Also, apologies for my previous less-than-patient response. Was having an awful day for many reasons which happened to have included dealing with ec2/IMDSv2 compatibility issues. (still kind of a big fan of Kiam wrt design, code quality, functionality and the way the uswitch team supported it. For once I've actually made the move to get out in front of the inevitable IMDSv1 shutdown. Quite clear how many things were intelligently included in the aforementioned spec once they are no longer present 🤔)