aws/aws-app-mesh-examples

Is howto-k8s-mtls-sds-based compatible with Bottlerocket?

AhmadMS1988 opened this issue · 0 comments

Platform
EKS 1.20, with Bottlerocket 1.1.2

To Reproduce
Apply howto-k8s-mtls-sds-based walkthrough

Describe the bug
After applying howto-k8s-mtls-sds-based walkthrough, the agents keeps restarting with the following logs:

time="2021-07-13T16:17:23Z" level=warning msg="Current umask 0022 is too permissive; setting umask 0027."
time="2021-07-13T16:17:23Z" level=info msg="Starting agent with data directory: \"/run/spire\""
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s_sat plugin_services="[]" plugin_type=NodeAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=memory plugin_services="[]" plugin_type=KeyManager subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=unix plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=debug msg="No pre-existing agent SVID found. Will perform node attestation" path=/run/spire/agent_svid.der subsystem_name=attestor
time="2021-07-13T16:17:23Z" level=debug msg="Starting checker" name=agent subsystem_name=health
time="2021-07-13T16:17:23Z" level=info msg="Starting workload API" subsystem_name=endpoints
time="2021-07-13T16:18:20Z" level=debug msg="New active connection to workload API" subsystem_name=workload_api
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=1 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=2 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=3 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=4 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=5 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=6 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Canceled desc = context canceled" pid=2363728 subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="PID attested to have selectors" pid=2363728 selectors="[type:\"unix\" value:\"uid:0\"  type:\"unix\" value:\"user:root\"  type:\"unix\" value:\"gid:0\"  type:\"unix\" value:\"group:root\" ]" subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="Closing connection to workload API" subsystem_name=workload_api

When trying to list the agents, the found attested agents keeps increasing

kubectl exec -n spire spire-server-0 -- /opt/spire/bin/spire-server agent list
Found 73 attested agents

and when trying to test agent connectivity from inside the agent container, I get the following error:

/opt/spire/bin/spire-agent api fetch -socketPath /run/spire/sockets/agent.sock
rpc error: code = DeadlineExceeded desc = context deadline exceeded

The following command is used for registration:

kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://${TRUST_DOMAIN}/ns/spire/sa/spire-agent \
    -selector k8s_sat:cluster:${EKS_CLUSTER_NAME} \
    -selector k8s_sat:agent_ns:spire \
    -selector k8s_sat:agent_sa:spire-agent \
    -node

and the following configurations are used:

apiVersion: v1
kind: Namespace
metadata:
  name: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-server
  namespace: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-agent
  namespace: spire
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-server-trust-role
rules:
- apiGroups: ["authentication.k8s.io"]
  resources: ["tokenreviews"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["patch", "get", "list"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-server-trust-role-binding
subjects:
- kind: ServiceAccount
  name: spire-server
  namespace: spire
roleRef:
  kind: ClusterRole
  name: spire-server-trust-role
  apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-agent-cluster-role
rules:
- apiGroups: [""]
  resources: ["pods","nodes","nodes/proxy"]
  verbs: ["get"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-agent-cluster-role-binding
subjects:
- kind: ServiceAccount
  name: spire-agent
  namespace: spire
roleRef:
  kind: ClusterRole
  name: spire-agent-cluster-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-bundle
  namespace: spire
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-server
  namespace: spire
data:
  server.conf: |
    server {
      bind_address = "0.0.0.0"
      bind_port = "8081"
      registration_uds_path = "/tmp/spire-registration.sock"
      trust_domain = "${TRUST_DOMAIN}"
      data_dir = "/run/spire/data"
      log_level = "DEBUG"
      ca_key_type = "rsa-2048"
      default_svid_ttl = "1h"
      ca_subject = {
        country = ["US"],
        organization = ["SPIFFE"],
        common_name = "",
      }
    }
    plugins {
      DataStore "sql" {
        plugin_data {
          database_type = "sqlite3"
          connection_string = "/run/spire/data/datastore.sqlite3"
        }
      }
      NodeAttestor "k8s_sat" {
        plugin_data {
          clusters = {
            "${EKS_CLUSTER_NAME}" = {
              use_token_review_api_validation = true
              service_account_whitelist = ["spire:spire-agent"]
            }
          }
        }
      }
      NodeResolver "noop" {
        plugin_data {}
      }
      KeyManager "disk" {
        plugin_data {
          keys_path = "/run/spire/data/keys.json"
        }
      }
      Notifier "k8sbundle" {
        plugin_data {
        }
      }
    }
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
  labels:
    app: spire-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spire-server
  serviceName: spire-server
  template:
    metadata:
      namespace: spire
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
        - name: spire-server
          image: gcr.io/spiffe-io/spire-server:0.10.0
          args:
            - -config
            - /run/spire/config/server.conf
          ports:
            - containerPort: 8081
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-data
              mountPath: /run/spire/data
              readOnly: false
          livenessProbe:
            exec:
              command:
                - /opt/spire/bin/spire-server
                - healthcheck
            failureThreshold: 2
            initialDelaySeconds: 15
            periodSeconds: 60
            timeoutSeconds: 3
      volumes:
        - name: spire-config
          configMap:
            name: spire-server
  volumeClaimTemplates:
    - metadata:
        name: spire-data
        namespace: spire
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: spire-server
  namespace: spire
spec:
  type: NodePort
  ports:
    - name: grpc
      port: 8081
      targetPort: 8081
      protocol: TCP
  selector:
    app: spire-server
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-agent
  namespace: spire
data:
  agent.conf: |
    agent {
      data_dir = "/run/spire"
      log_level = "DEBUG"
      server_address = "spire-server"
      server_port = "8081"
      socket_path = "/run/spire/sockets/agent.sock"
      trust_bundle_path = "/run/spire/bundle/bundle.crt"
      trust_domain = "${TRUST_DOMAIN}"
      enable_sds = true
    }

    plugins {
      NodeAttestor "k8s_sat" {
        plugin_data {
          cluster = "${EKS_CLUSTER_NAME}"
        }
      }

      KeyManager "memory" {
        plugin_data {
        }
      }

      WorkloadAttestor "k8s" {
        plugin_data {
          skip_kubelet_verification = true
        }
      }

      WorkloadAttestor "unix" {
          plugin_data {
          }
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
  labels:
    app: spire-agent
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      namespace: spire
      labels:
        app: spire-agent
    spec:
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: spire-agent
      initContainers:
        - name: init
          image: gcr.io/spiffe-io/wait-for-it
          args: ["-t", "30", "spire-server:8081"]
      containers:
        - name: spire-agent
          image: gcr.io/spiffe-io/spire-agent:0.10.0
          args: ["-config", "/run/spire/config/agent.conf"]
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-bundle
              mountPath: /run/spire/bundle
            - name: spire-agent-socket
              mountPath: /run/spire/sockets
              readOnly: false
          livenessProbe:
            exec:
              command:
                - /opt/spire/bin/spire-agent
                - healthcheck
                - -socketPath
                - /run/spire/sockets/agent.sock
            failureThreshold: 2
            initialDelaySeconds: 15
            periodSeconds: 60
            timeoutSeconds: 3
      volumes:
        - name: spire-config
          configMap:
            name: spire-agent
        - name: spire-bundle
          configMap:
            name: spire-bundle
        - name: spire-agent-socket
          hostPath:
            path: /run/spire/sockets
            type: DirectoryOrCreate

Please your help if we doing anything wrong.
Best regards