Is howto-k8s-mtls-sds-based compatible with Bottlerocket?
AhmadMS1988 opened this issue · 0 comments
AhmadMS1988 commented
Platform
EKS 1.20, with Bottlerocket 1.1.2
To Reproduce
Apply howto-k8s-mtls-sds-based walkthrough
Describe the bug
After applying howto-k8s-mtls-sds-based walkthrough, the agents keeps restarting with the following logs:
time="2021-07-13T16:17:23Z" level=warning msg="Current umask 0022 is too permissive; setting umask 0027."
time="2021-07-13T16:17:23Z" level=info msg="Starting agent with data directory: \"/run/spire\""
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s_sat plugin_services="[]" plugin_type=NodeAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=memory plugin_services="[]" plugin_type=KeyManager subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=unix plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=debug msg="No pre-existing agent SVID found. Will perform node attestation" path=/run/spire/agent_svid.der subsystem_name=attestor
time="2021-07-13T16:17:23Z" level=debug msg="Starting checker" name=agent subsystem_name=health
time="2021-07-13T16:17:23Z" level=info msg="Starting workload API" subsystem_name=endpoints
time="2021-07-13T16:18:20Z" level=debug msg="New active connection to workload API" subsystem_name=workload_api
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=1 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=2 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=3 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=4 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=5 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=6 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Canceled desc = context canceled" pid=2363728 subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="PID attested to have selectors" pid=2363728 selectors="[type:\"unix\" value:\"uid:0\" type:\"unix\" value:\"user:root\" type:\"unix\" value:\"gid:0\" type:\"unix\" value:\"group:root\" ]" subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="Closing connection to workload API" subsystem_name=workload_api
When trying to list the agents, the found attested agents keeps increasing
kubectl exec -n spire spire-server-0 -- /opt/spire/bin/spire-server agent list
Found 73 attested agents
and when trying to test agent connectivity from inside the agent container, I get the following error:
/opt/spire/bin/spire-agent api fetch -socketPath /run/spire/sockets/agent.sock
rpc error: code = DeadlineExceeded desc = context deadline exceeded
The following command is used for registration:
kubectl exec -n spire spire-server-0 -- \
/opt/spire/bin/spire-server entry create \
-spiffeID spiffe://${TRUST_DOMAIN}/ns/spire/sa/spire-agent \
-selector k8s_sat:cluster:${EKS_CLUSTER_NAME} \
-selector k8s_sat:agent_ns:spire \
-selector k8s_sat:agent_sa:spire-agent \
-node
and the following configurations are used:
apiVersion: v1
kind: Namespace
metadata:
name: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spire-server
namespace: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spire-agent
namespace: spire
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: spire-server-trust-role
rules:
- apiGroups: ["authentication.k8s.io"]
resources: ["tokenreviews"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["patch", "get", "list"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: spire-server-trust-role-binding
subjects:
- kind: ServiceAccount
name: spire-server
namespace: spire
roleRef:
kind: ClusterRole
name: spire-server-trust-role
apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: spire-agent-cluster-role
rules:
- apiGroups: [""]
resources: ["pods","nodes","nodes/proxy"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: spire-agent-cluster-role-binding
subjects:
- kind: ServiceAccount
name: spire-agent
namespace: spire
roleRef:
kind: ClusterRole
name: spire-agent-cluster-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spire-bundle
namespace: spire
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spire-server
namespace: spire
data:
server.conf: |
server {
bind_address = "0.0.0.0"
bind_port = "8081"
registration_uds_path = "/tmp/spire-registration.sock"
trust_domain = "${TRUST_DOMAIN}"
data_dir = "/run/spire/data"
log_level = "DEBUG"
ca_key_type = "rsa-2048"
default_svid_ttl = "1h"
ca_subject = {
country = ["US"],
organization = ["SPIFFE"],
common_name = "",
}
}
plugins {
DataStore "sql" {
plugin_data {
database_type = "sqlite3"
connection_string = "/run/spire/data/datastore.sqlite3"
}
}
NodeAttestor "k8s_sat" {
plugin_data {
clusters = {
"${EKS_CLUSTER_NAME}" = {
use_token_review_api_validation = true
service_account_whitelist = ["spire:spire-agent"]
}
}
}
}
NodeResolver "noop" {
plugin_data {}
}
KeyManager "disk" {
plugin_data {
keys_path = "/run/spire/data/keys.json"
}
}
Notifier "k8sbundle" {
plugin_data {
}
}
}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: spire-server
namespace: spire
labels:
app: spire-server
spec:
replicas: 1
selector:
matchLabels:
app: spire-server
serviceName: spire-server
template:
metadata:
namespace: spire
labels:
app: spire-server
spec:
serviceAccountName: spire-server
containers:
- name: spire-server
image: gcr.io/spiffe-io/spire-server:0.10.0
args:
- -config
- /run/spire/config/server.conf
ports:
- containerPort: 8081
volumeMounts:
- name: spire-config
mountPath: /run/spire/config
readOnly: true
- name: spire-data
mountPath: /run/spire/data
readOnly: false
livenessProbe:
exec:
command:
- /opt/spire/bin/spire-server
- healthcheck
failureThreshold: 2
initialDelaySeconds: 15
periodSeconds: 60
timeoutSeconds: 3
volumes:
- name: spire-config
configMap:
name: spire-server
volumeClaimTemplates:
- metadata:
name: spire-data
namespace: spire
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: spire-server
namespace: spire
spec:
type: NodePort
ports:
- name: grpc
port: 8081
targetPort: 8081
protocol: TCP
selector:
app: spire-server
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spire-agent
namespace: spire
data:
agent.conf: |
agent {
data_dir = "/run/spire"
log_level = "DEBUG"
server_address = "spire-server"
server_port = "8081"
socket_path = "/run/spire/sockets/agent.sock"
trust_bundle_path = "/run/spire/bundle/bundle.crt"
trust_domain = "${TRUST_DOMAIN}"
enable_sds = true
}
plugins {
NodeAttestor "k8s_sat" {
plugin_data {
cluster = "${EKS_CLUSTER_NAME}"
}
}
KeyManager "memory" {
plugin_data {
}
}
WorkloadAttestor "k8s" {
plugin_data {
skip_kubelet_verification = true
}
}
WorkloadAttestor "unix" {
plugin_data {
}
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spire-agent
namespace: spire
labels:
app: spire-agent
spec:
selector:
matchLabels:
app: spire-agent
template:
metadata:
namespace: spire
labels:
app: spire-agent
spec:
hostPID: true
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
serviceAccountName: spire-agent
initContainers:
- name: init
image: gcr.io/spiffe-io/wait-for-it
args: ["-t", "30", "spire-server:8081"]
containers:
- name: spire-agent
image: gcr.io/spiffe-io/spire-agent:0.10.0
args: ["-config", "/run/spire/config/agent.conf"]
volumeMounts:
- name: spire-config
mountPath: /run/spire/config
readOnly: true
- name: spire-bundle
mountPath: /run/spire/bundle
- name: spire-agent-socket
mountPath: /run/spire/sockets
readOnly: false
livenessProbe:
exec:
command:
- /opt/spire/bin/spire-agent
- healthcheck
- -socketPath
- /run/spire/sockets/agent.sock
failureThreshold: 2
initialDelaySeconds: 15
periodSeconds: 60
timeoutSeconds: 3
volumes:
- name: spire-config
configMap:
name: spire-agent
- name: spire-bundle
configMap:
name: spire-bundle
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: DirectoryOrCreate
Please your help if we doing anything wrong.
Best regards