
0.12.3 -> 0.13.0-rc1 upgrade. Workloads fails to start due to pod security policy issue

paalkr opened this issue · 7 comments

Workloads deployed to a 0.12.3 kube-aws cluster (kubernetes v 1.12.4) does not work after the cluster is updated to kube-aws 0.13.0-rc1 (kubernetes v 1.13.5).

The grafana container deployed with helm using the prometheus-operator chart does complain about apparmor not running in the node.

Labels:             app=grafana
Annotations:        checksum/config: 112d1de8efd11e546e384adb09cee5b5a81448bfbdb77bbe096ba9fa9e0f5b85
                    checksum/dashboards-json-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
                    checksum/sc-dashboard-provider-config: 4a8da5e1302c610a2d3a86c4fb1135ee01095d9a321d918a698c29628622aa8f
                    checksum/secret: 12f8d0d2360aed7f5a689fce0bb74a98d4f6a118a3b7ff2b5cec9e9a43cce703
                    container.apparmor.security.beta.kubernetes.io/grafana: runtime/default
                    container.apparmor.security.beta.kubernetes.io/grafana-sc-dashboard: runtime/default
                    container.apparmor.security.beta.kubernetes.io/grafana-sc-datasources: runtime/default
                    kubernetes.io/psp: monitoring-grafana
                    seccomp.security.alpha.kubernetes.io/pod: docker/default
Status:             Pending
Reason:             AppArmor
Message:            Cannot enforce AppArmor: AppArmor is not enabled on the host

The error I get in the replica set for all pods not running in the kube-system namespace is.
Error creating: pods "<deployment_name>-<hash>-" is forbidden: unable to validate against any pod security policy: []

I imagine this case should be handled by the 00-kube-aws-permissive psp, as described in

Discussion on slack

The problem is that @paalkr has existing PodSecurityPolicies in his cluster - so we don't automatically map all service accounts, users and nodes to our permissive policy. We only do that when there are no existing policies. @paalkr I suggest that you either create a new PodSecurityPolicy and map it to the service accounts/namespaces/users you want to allow. Or use a ClusterRoleBinding to map them to our 00-kube-aws-permissive policy.

Updated release note

Thanks for clarifying. I guess our best option to update our ClusterRoleBindings

So my quick and dirty fix to make sure that the updated cluster functions the same ways as before the upgrade, is to manually deploy the kube-aws:permissive-psp-cluster-wide ClusterRoleBinding after updating the control-plane. For new clusters we will start to use proper Pod Security Policies

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
  name: kube-aws:permissive-psp-cluster-wide
  kind: ClusterRole
  name: kube-aws:permissive-psp
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: system:serviceaccounts
- kind: Group
  name: system:autheniticated

I'm closing this issue because I manage to work around the problem by manually deploy the permissive ClusterRoleBinding. I understand it's hard to fully automate this though, but I wonder if and how we might make a better upgrade experience

I imagine you could do something like this as well

    - path: "/srv/kubernetes/manifests/custom/permissive-psp.yaml"
      permissions: 0644
      content: |
        apiVersion: rbac.authorization.k8s.io/v1
        kind: ClusterRoleBinding
          name: kube-aws:permissive-psp-cluster-wide
          kind: ClusterRole
          name: kube-aws:permissive-psp
          apiGroup: rbac.authorization.k8s.io
        - kind: Group
          name: system:serviceaccounts
        - kind: Group
          name: system:autheniticated

Yup, that worked!