uswitch/kiam

Context conceled and KiamCredentialError

pjaak opened this issue · 4 comments

pjaak commented

Hi,

I have been having issues with kiam on AWS recently:

Below is server logs:
{"cache.key":"arn:aws:iam::399203743512:role/p-sng-survey-role||","level":"debug","msg":"evicted credentials future had error: RequestCanceled: request context canceled\ncaused by: context canceled","time":"2021-08-05T06:54:48Z"}

{"level":"error","msg":"error requesting credentials: RequestCanceled: request context canceled\ncaused by: context canceled","pod.iam.role":{"Name":"d-survey-role","ARN":"arn:aws:iam::XXXXXX:role/d-survey-role"},"pod.iam.roleArn":"arn:aws:iam::XXXXXX:role/d-survey-role","time":"2021-08-05T01:02:27Z"} {"generation.metadata":0,"level":"error","msg":"error retrieving credentials: RequestCanceled: request context canceled\ncaused by: context canceled","pod.iam.requestedRole":"d-survey-role","pod.iam.role":"d-survey-role","pod.name":"d-survey-php-5bb8977bc5-mz9gw","pod.namespace":"survey","pod.status.ip":"100.116.58.28","pod.status.phase":"Running","resource.version":"295642124","time":"2021-08-05T01:02:27Z"}

Also receive this error on the server after the above:
due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'KiamCredentialError' 'failed retrieving credentials: RequestCanceled: request context canceled'

On the agent I am seeing these:
{"addr":"100.111.254.80:57774","level":"error","method":"GET","msg":"error processing request: error fetching credentials: rpc error: code = Canceled desc = context canceled","path":"/latest/meta-data/iam/security-credentials/d-survey-role","status":500,"time":"2021-08-05T01:20:24Z"}

I have tried adjusting ENV variables such as:
AWS_METADATA_SERVICE_TIMEOUT: 10 AWS_METADATA_SERVICE_NUM_ATTEMPTS: 5

I have got prometheus and grafana setup and noticing:
image

Any ideas? Currently my application cant call AWS resources because it cant get credentials.

Thanks in advance

+1

We are having errors with 'context canceled' as well.

jjo commented

check #484 -- I'm seeing alike errors from that selfLink issue

Thanks @jjo, we are on 1.21 so this could definitely be it. Do you know if a release and new image are planned for this?

We have run into this with the current KIAM release (using Helm Chart 6.1.2 on EKS 1.21).

While this is not a fix by any means you can remediate this by provisioning new daemonset pods if authentication stops working for you.

kubectl delete pods -n kube-system -l app=kiam

For my team, this worked several months before breaking silently.

EDIT: We've also seen this on an EKS cluster running 1.19.