Context conceled and KiamCredentialError
pjaak opened this issue · 4 comments
Hi,
I have been having issues with kiam on AWS recently:
Below is server logs:
{"cache.key":"arn:aws:iam::399203743512:role/p-sng-survey-role||","level":"debug","msg":"evicted credentials future had error: RequestCanceled: request context canceled\ncaused by: context canceled","time":"2021-08-05T06:54:48Z"}
{"level":"error","msg":"error requesting credentials: RequestCanceled: request context canceled\ncaused by: context canceled","pod.iam.role":{"Name":"d-survey-role","ARN":"arn:aws:iam::XXXXXX:role/d-survey-role"},"pod.iam.roleArn":"arn:aws:iam::XXXXXX:role/d-survey-role","time":"2021-08-05T01:02:27Z"} {"generation.metadata":0,"level":"error","msg":"error retrieving credentials: RequestCanceled: request context canceled\ncaused by: context canceled","pod.iam.requestedRole":"d-survey-role","pod.iam.role":"d-survey-role","pod.name":"d-survey-php-5bb8977bc5-mz9gw","pod.namespace":"survey","pod.status.ip":"100.116.58.28","pod.status.phase":"Running","resource.version":"295642124","time":"2021-08-05T01:02:27Z"}
Also receive this error on the server after the above:
due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'KiamCredentialError' 'failed retrieving credentials: RequestCanceled: request context canceled'
On the agent I am seeing these:
{"addr":"100.111.254.80:57774","level":"error","method":"GET","msg":"error processing request: error fetching credentials: rpc error: code = Canceled desc = context canceled","path":"/latest/meta-data/iam/security-credentials/d-survey-role","status":500,"time":"2021-08-05T01:20:24Z"}
I have tried adjusting ENV variables such as:
AWS_METADATA_SERVICE_TIMEOUT: 10 AWS_METADATA_SERVICE_NUM_ATTEMPTS: 5
I have got prometheus and grafana setup and noticing:
Any ideas? Currently my application cant call AWS resources because it cant get credentials.
Thanks in advance
+1
We are having errors with 'context canceled' as well.
Thanks @jjo, we are on 1.21
so this could definitely be it. Do you know if a release and new image are planned for this?
We have run into this with the current KIAM release (using Helm Chart 6.1.2
on EKS 1.21).
While this is not a fix by any means you can remediate this by provisioning new daemonset pods if authentication stops working for you.
kubectl delete pods -n kube-system -l app=kiam
For my team, this worked several months before breaking silently.
EDIT: We've also seen this on an EKS cluster running 1.19.