external-secrets/kubernetes-external-secrets

ERROR, Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

Closed this issue · 10 comments

We started seeing externalsecrets with status ERROR, Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 intermittently after upgrading to 8.1.3 from 6.0.0 in one of the cluster. There are times when status is success for the same secret.

We have multiple clusters with externalsecrets 8.1.3. The external secrets are with status success in clusters upto 120 secrets. The cluster where we see failures have more than 500 secrets.

We think it might be because of throttling from AWS, however logs does not indicate clearly.

We are using Per pod IAM authentication using kube2iam

env values

      - env:
        - name: AKEYLESS_API_ENDPOINT
          value: https://api.akeyless.io
        - name: AWS_DEFAULT_REGION
          value: us-west-2
        - name: AWS_REGION
          value: us-west-2
        - name: LOG_LEVEL
          value: debug
        - name: LOG_MESSAGE_KEY
          value: msg
        - name: METRICS_PORT
          value: "3001"
        - name: POLLER_INTERVAL_MILLISECONDS
          value: "300000"
        - name: VAULT_ADDR
          value: http://127.0.0.1:8200

Log message

{"level":20,"message_time":"2021-07-14T19:57:25.997Z","pid":17,"hostname":"external-secrets-kubernetes-external-secrets-c6dcc947f-x9r97","msg":"updating status for abc/streamingapiendpoint-client-secret to: ERROR, Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1"}
{"level":50,"message_time":"2021-07-14T19:57:26.395Z","pid":17,"hostname":"external-secrets-kubernetes-external-secrets-c6dcc947f-x9r97","payload":{"message":"Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1","code":"CredentialsError","path":null,"host":"sts.us-west-2.amazonaws.com","port":443,"time":"2021-07-14T19:57:26.395Z","region":"us-west-2","hostname":"sts.us-west-2.amazonaws.com","retryable":true,"originalError":{"message":"Could not load credentials from ChainableTemporaryCredentials","code":"CredentialsError","path":null,"host":"sts.us-west-2.amazonaws.com","port":443,"time":"2021-07-14T19:57:26.395Z","region":"us-west-2","hostname":"sts.us-west-2.amazonaws.com","retryable":true,"originalError":{"message":"Client network socket disconnected before secure TLS connection was established","code":"TimeoutError","path":null,"host":"sts.us-west-2.amazonaws.com","port":443,"time":"2021-07-14T19:57:26.395Z","region":"us-west-2","hostname":"sts.us-west-2.amazonaws.com","retryable":true}}},"msg":"failure while polling the secret abc/abc"}

Do you have any suggestions or fixes?
We did not see any option which can help with throttling other than WATCH_TIMEOUT. Removing WATCH_TIMEOUT vafiable itself also did not help.

EKami commented

I'm having the exact same issue :(

I have the same issue :'(

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

In my case, the pod was running as non-root user and wasn't able to read a token file.
I have fixed it by setting securityContext:

securityContext:
  fsGroup: 65534

I have the same problem, someone can give some tips ?

The error message ERROR, Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 can very well because of credential issue. However for us that was not the reason, 8.1.x versions were throwing the error when we had too many secrets, number was around 370 secrets. We have multiple clusters in the same aws account/region. If could have contributed to the issue as well. v6.0.0 seems to work fine and you may want to add security fixes on top of it.

Seeing the same things on my side, has anyone figured a way around it? Passing in a service account to the pod with the associated IAMroleArn also does not seem to work.

If you are not already running KES see

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

This issue was closed because it has been stalled for 30 days with no activity.