openshift/cloud-credential-operator

cloud-credential-operator gives an error with the Openshift private cluster deployed on AWS

htkmts opened this issue · 9 comments

When cloud-credential-operator is run on Openshift private cluster deployed on AWS, it gives an error.
(By "private", I mean the Openshift cluster cannot access the internet.)
It seems that cloud-credential-operator tries to access "https://iam.amazonaws.com" at the time of execution and this is causing the error.

Please refer below for a sample error message.

2020-09-23T08:37:17.068691724Z time="2020-09-23T08:37:17Z" level=debug msg="target secret exists" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2020-09-23T08:37:17.068702622Z time="2020-09-23T08:37:17Z" level=debug msg="found access key ID in target secret" accessKeyID=xxx actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2020-09-23T08:37:17.068826396Z time="2020-09-23T08:37:17Z" level=debug msg="loading AWS credentials from secret" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds
2020-09-23T08:37:17.06883828Z time="2020-09-23T08:37:17Z" level=debug msg="creating read AWS client" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds
2020-09-23T08:37:18.827808247Z time="2020-09-23T08:37:18Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error gathering AWS credentials details: error querying username: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout" controller=secretannotator
2020-09-23T08:37:19.828069938Z time="2020-09-23T08:37:19Z" level=info msg="validating cloud cred secret" controller=secretannotator
2020-09-23T08:37:19.828120926Z time="2020-09-23T08:37:19Z" level=debug msg="Loading infrastructure name: xxx" controller=secretannotator
2020-09-23T08:39:11.679120982Z time="2020-09-23T08:39:11Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
2020-09-23T08:39:11.679976107Z time="2020-09-23T08:39:11Z" level=info msg="reconcile complete" controller=metrics elapsed="912.531µs"
2020-09-23T08:39:20.160892359Z time="2020-09-23T08:39:20Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error gathering AWS credentials details: error querying username: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout" controller=secretannotator
2020-09-23T08:39:21.161091474Z time="2020-09-23T08:39:21Z" level=info msg="validating cloud cred secret" controller=secretannotator
2020-09-23T08:39:21.161123255Z time="2020-09-23T08:39:21Z" level=debug msg="Loading infrastructure name: xxx" controller=secretannotator

Q1. Are there any workaround for this?
Q2. Is it MANDATORY for cloud-credential-operator to be able to access the internet? (This makes it impossible for any Openshift clusters to be private...)

Thanks

I'm quite sure you can make an OpenShift 4 cluster private with respect to the open internet, I'm not sure you can have an AWS OpenShift cluster unable to reach AWS.

For the CCO you can likely put it into manual mode and maintain your credentials yourself on an on-going basis. https://docs.openshift.com/container-platform/4.5/installing/installing_aws/manually-creating-iam.html.

However I'm more concerned about the rest of the components that are going to use those credentials. Can the machine-api-operator reach the EC2 APIs to manage your worker nodes? Can the ingress operator do it's work with load balancers? Is this specific to IAM or can you not reach any AWS APIs?

dgoodwin,

Thanks for your comment.
For the EC2 APIs, AWS provides VPC endpoint which enables calling APIs from private network without going out to the internet so I managed to get around that by using VPC endpoint.
However, there are AWS services that are not supported by VPC endpoints and unfortunately, IAM is one of them...

I guess I will try and see if I can work with the manual mode for the credentials.

With regard to the rest of the cluster components, does anyone know any documents that have a list of AWS APIs required for running Openshift Cluster on AWS?

Thanks.

You have 2 choices I believe.

  1. set up a proxy that allows access to the AWS IAM endpoint https://docs.openshift.com/container-platform/4.5/networking/enable-cluster-wide-proxy.html
  2. as @dgoodwin said above, run CCO in manual mode where the user is responsible for creating the needed credentials for the components that need cloud credentials.

Going through the instructions for putting CCO in manual mode will show you all the components and their required API permissions.

@joelddiaz @dgoodwin
Thank you very much for your help.

For option No.2, are there any ways to this AFTER the OCP cluster has been installed?
The instructions in the manual seems that the steps need to be done prior to the installation.

Thanks.

If your installation correctly minted all your CredentialsRequests, you could try moving to Manual mode post-install, we do this sometimes for development.

kubectl edit CloudCredential cluster and change the mode to "Manual".

@dgoodwin
Thanks for your comments.

I tried the command but it gave me an error...

kubectl edit CloudCredential cluster
error: the server doesn't have a resource type "CloudCredential"

Could you give me some more details on how to change CCO to "manual" mode?

Thanks.

What version of OpenShift are you running? This should be available in the latest 5.9 release.

We are currently using OCP 4.3.

We managed to bypass the error by editing the configmap for CCO and change disabled: "false" -> disabled: "true".
Is this the correct way to change CCO to "manual" mode?

data:
disabled: "true"

Thanks.

No problem, going to close issue as I think we're resolved for now.