openshift/cloud-credential-operator

Cloud credentials platform scope

mjudeikis opened this issue · 10 comments

Follow-up openshift/machine-api-operator#328

Currently, all cloud credentials are being added to the clusters.
This means if I'm running AWS cluster I will see references for Azure cluster in the secrets. And another way around.

[mjudeiki@redhat openshift-azure]$ oc get credentialsrequests.cloudcredential.openshift.io 
NAME                               AGE
azure-openshift-ingress            161m
cloud-credential-operator-iam-ro   161m
openshift-image-registry           161m
openshift-ingress                  161m

openshift-machine-api              161m
openshift-machine-api-azure        161m

Only the required credentials will be fulfilled based on the cloud where we are running.

OCP will be running as Managed service on Azure and AWS and potentially more clouds will follow. This means that these credentials now shows up in managed service offerings. Which is not acceptable.

In certain cloud providers, this is the first party service, sold by them for their customers. And having other cloud provider references in their offerings is not acceptable.

We need a clear way to distinguish which platform we are running and deliver only those credentials.

/cc @jim-minter @pweil- @dgoodwin

Just to make sure the above is clear, we do not mint credentials for other clouds, but we deliver all cloud CredentialRequests, and the operator acts only on those which match the cloud we're running on. These requests are visible if someone were to do an oc get CredentialsRequests -n openshift-cloud-credential-operator per the output above.

We were asked to ensure that credentials requests were auditable. Originally we did this by pushing all CredRequests into the credential operator git repo. This wasn't scaling well for the teams that need them so we decided instead to move them to each components respective repository and give those teams control over their permissions. We decided that the release image would be the point where credentials requests can be audited. We have only one release image, not one per cloud, and thus all credentials requests are delivered in it.

As far as I know there are no provisions to deliver some resources and not others in the release image, and if we try to delete it after creation the CVO will constantly keep trying to re-apply it.

We can't have each component create their own CredentialsRequest programmatically as this is an audit nightmare.

We might be able to revert the earlier decision, push all Credentials back into the operator repo, pull them out of the release image, and create them programmatically in the credentials operator. This however may be a disruptive and tricky change to roll out on the first upgrade especially, and we need to verify if each team is willing to give up control over their permissions. Will raise in the Group G arch call this week.

AWS fully managed clusters are locked down, users cannot oc get CredentialsRequests -n openshift-cloud-credential-operator. Is it even going to be possible for a user to list these in a managed Azure cluster?

I think CredentialRequest can stay in the release image, but we just need a clear way to "create" only needed ones based on the platform we specify in the InstallConfig. This should not be hard, and still keeps the auditable trail.

Its dual problem, one is esthetical - Azure Managed Service with components referring AWS and versus just does not look professionally from cloud perspective (We might not care, but its partnership engagement so we have to care)
Second, the current assumption that CredentailRequests are not accessible is based on custom RBAC cluster admin used in OSD and ARO. And there is big ask to open in up to enable more feature parity between on-prem clusters and managed openshift in the cloud. To mask these resources with RBAC which migh or might not change in the future does not sound like the right way.

I agree there might not be deep technical reasoning here to change it. But again - we can't just throw all resources to the cluster with the idea "If it sticks it sticks" It just does not sound right.

Yes if the CVO could conditionally skip some resources in release manifest based on cloud that might be an option, something we will discuss with group g arch call tomorrow.

To be clear you would not have to mask with RBAC as this is the default behavior, you have to explicitly grant RBAC to let users see them. So unless you plan to give users full cluster admin (which seems unlikely in any managed offering), they will never be able to see them unless explicitly granted. This may affect the priority with which this needs to be solved.

which seems unlikely in any managed offering - And I would not state this so boldly :) We know this is a problem but we need to solve it. So either way, we will solve it I don't want to couple any solutions to the existing one.

Thanks, let me know the outcome of the call

"seems unlikely" != stating boldly.

Are you implying you have a requirement to give full cluster admin to managed cluster users? Or do you have a requirement to give more power to users, but not full cluster admin? Cluster admin gives you many of ways to completely destroy the cluster, which is why I don't think it makes sense in a managed offering. This context may be important for prioritization.

We have a requirement to give more power to the users. Like privileged containers, and wider cluster-admin. What will be the path to implement this (if we chose to) - it's still very unclear. Might be just relaxed RBAC model or even admission controller to prevent certain actions. But none of those solutions is "golden bullet" so we will need to come up with some kind of sustainable solution 🤷‍♂️

Here's an idea for API would might allow a single CredentialsRequest to serve the ingress operator's use case, at least:

apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: openshift-ingress
  namespace: openshift-cloud-credential-operator
spec:
  secretRef:
    name: cloud-credentials
    namespace: openshift-ingress-operator
  providerSpec:
    - apiVersion: cloudcredential.openshift.io/v1
      kind: PlatformProviderSpec
      providerSpecs:
      - apiVersion: cloudcredential.openshift.io/v1
        kind: AWSProviderSpec
        statementEntries:
        - effect: Allow
          action:
          - elasticloadbalancing:DescribeLoadBalancers
          - route53:ListHostedZones
          - route53:ChangeResourceRecordSets
          - tag:GetResources
          resource: "*"
      - apiVersion: cloudcredential.openshift.io/v1
        kind: AzureProviderSpec
        roleBindings:
        - role: passthrough
          scope: resourcegroup

And so only the request would be fulfilled for whichever providerSpecs entry matches the platform.

This consolidates all the requests into a single credentialsrequest, but does not hide the fact that OpenShift has a "template" for how to integrate with other platforms (although the fact is buried slightly compared to multiple credentialsrequests, and of course less so than the proposed solution to prevent the other configurations from escaping the payload — not sure if there was a consensus on whether that was a desirable characteristic).

Anyway, more food for thought!

We discussed in the architecture call, unfortunately the sentiment was unanimously that we do not consider this serious enough to pursue right now, amidst all the other things we need to be focusing on. The fact we run on multiple clouds is not a secret and is in fact part of our core business and we don't really want to be expending effort to try to obfuscate that fact. You could run "strings" on any OpenShift binary and see references to other clouds.

CredentialsRequests will not even be visible to managed cluster users unless the permission is explicitly granted.

If you wish to escalate through product management please file an RFE on the RFE Jira board.

closing per last comment