CrashLoopBackopff in Collector's Deamon Set on OpenShift 4.9
Balaji-MP opened this issue · 25 comments
Hello Team, received the following error while deploying the collector in openshift 4.9. Initially thought this is a permission issue and added the required SCC to collector's service account, but still the issue persists.
terminate called after throwing an instance of 'scap_open_exception'
what(): can't create map: Permission denied
collector[0x448f7d]
/lib64/libc.so.6(+0x4eb80)[0x7f726981fb80]
/lib64/libc.so.6(gsignal+0x10f)[0x7f726981faff]
/lib64/libc.so.6(abort+0x127)[0x7f72697f2ea5]
/lib64/libstdc++.so.6(+0x9009b)[0x7f726a1c109b]
/lib64/libstdc++.so.6(+0x9653c)[0x7f726a1c753c]
/lib64/libstdc++.so.6(+0x96597)[0x7f726a1c7597]
/lib64/libstdc++.so.6(+0x967f8)[0x7f726a1c77f8]
/usr/local/lib/libsinsp-wrapper.so(+0x240ef5)[0x7f726c82cef5]
/usr/local/lib/libsinsp-wrapper.so(_ZN5sinsp4openERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x36)[0x7f726c866c16]
collector[0x4d2b34]
collector[0x46631c]
collector[0x442bec]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f726980bd85]
collector[0x448e2e]
Caught signal 6 (SIGABRT): Aborted
/bootstrap.sh: line 94: 10 Aborted eval exec "$@"
@Balaji-MP It definitely looks like lack of permissions to load eBPF probe. Just in case, could you share the definition of DaemonSet and the SecurityContext you've got in the end?
@erthalion here is the definition and security context within in it
`apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "4"
email: support@stackrox.com
meta.helm.sh/release-name: stackrox-secured-cluster-services
meta.helm.sh/release-namespace: rhacs-operator
owner: stackrox
creationTimestamp: "2023-02-16T08:25:19Z"
generation: 4
labels:
app: collector
app.kubernetes.io/component: collector
app.kubernetes.io/instance: stackrox-secured-cluster-services
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: stackrox
app.kubernetes.io/part-of: stackrox-secured-cluster-services
app.kubernetes.io/version: 3.73.2
auto-upgrade.stackrox.io/component: sensor
helm.sh/chart: stackrox-secured-cluster-services-73.2.0
service: collector
name: collector
namespace: rhacs-operator
ownerReferences:
- apiVersion: platform.stackrox.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: SecuredCluster
name: stackrox-secured-cluster-services
uid: 5b40f3be-1e30-4ded-8480-67fb0a8b03b8
resourceVersion: "1074444903"
uid: b4786759-f8f7-4bb8-bdef-ee975923e740
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
service: collector
template:
metadata:
annotations:
email: support@stackrox.com
meta.helm.sh/release-name: stackrox-secured-cluster-services
meta.helm.sh/release-namespace: rhacs-operator
owner: stackrox
creationTimestamp: null
labels:
app: collector
app.kubernetes.io/component: collector
app.kubernetes.io/instance: stackrox-secured-cluster-services
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: stackrox
app.kubernetes.io/part-of: stackrox-secured-cluster-services
app.kubernetes.io/version: 3.73.2
helm.sh/chart: stackrox-secured-cluster-services-73.2.0
service: collector
namespace: rhacs-operator
spec:
containers:- env:
- name: COLLECTOR_CONFIG
value: '{"tlsConfig":{"caCertPath":"/var/run/secrets/stackrox.io/certs/ca.pem","clientCertPath":"/var/run/secrets/stackrox.io/certs/cert.pem","clientKeyPath":"/var/run/secrets/stackrox.io/certs/key.pem"}}' - name: COLLECTION_METHOD
value: EBPF - name: GRPC_SERVER
value: sensor.rhacs-operator.svc:443 - name: SNI_HOSTNAME
value: sensor.stackrox.svc
image: registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:c15a9d534e6b0bd73bee22aa8c67503e53266b47f9dd9ef11f9f05f6d007ae02
imagePullPolicy: Always
name: collector
resources:
limits:
cpu: 750m
memory: 1Gi
requests:
cpu: 50m
memory: 320Mi
securityContext:
capabilities:
drop:- NET_RAW
privileged: true
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- NET_RAW
- mountPath: /host/var/run/docker.sock
name: var-run-docker-sock
readOnly: true - mountPath: /host/proc
name: proc-ro
readOnly: true - mountPath: /module
name: tmpfs-module - mountPath: /host/etc
name: etc-ro
readOnly: true - mountPath: /host/usr/lib
name: usr-lib-ro
readOnly: true - mountPath: /host/sys
name: sys-ro
readOnly: true - mountPath: /host/dev
name: dev-ro
readOnly: true - mountPath: /run/secrets/stackrox.io/certs/
name: certs
readOnly: true
- name: COLLECTOR_CONFIG
- command:
- stackrox/compliance
env: - name: ROX_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName - name: ROX_ADVERTISED_ENDPOINT
value: sensor.rhacs-operator.svc:443
image: registry.redhat.io/advanced-cluster-security/rhacs-main-rhel8@sha256:727e14f925b7f6bbde4ed291a6b9c4c0e068519364b6fea5ef86126775a0cc9e
imagePullPolicy: IfNotPresent
name: compliance
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 10m
memory: 10Mi
securityContext:
readOnlyRootFilesystem: true
runAsUser: 0
seLinuxOptions:
type: container_runtime_t
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts: - mountPath: /etc/ssl/
name: etc-ssl - mountPath: /etc/pki/ca-trust/
name: etc-pki-volume - mountPath: /host
name: host-root-ro
readOnly: true - mountPath: /run/secrets/stackrox.io/certs/
name: certs
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsGroup: 3000
runAsUser: 1000
serviceAccount: collector
serviceAccountName: collector
terminationGracePeriodSeconds: 30
tolerations:
- stackrox/compliance
- operator: Exists
volumes: - hostPath:
path: /var/run/docker.sock
type: ""
name: var-run-docker-sock - hostPath:
path: /proc
type: ""
name: proc-ro - emptyDir:
medium: Memory
name: tmpfs-module - hostPath:
path: /etc
type: ""
name: etc-ro - hostPath:
path: /usr/lib
type: ""
name: usr-lib-ro - hostPath:
path: /sys/
type: ""
name: sys-ro - hostPath:
path: /dev
type: ""
name: dev-ro - name: certs
secret:
defaultMode: 420
items:- key: collector-cert.pem
path: cert.pem - key: collector-key.pem
path: key.pem - key: ca.pem
path: ca.pem
secretName: collector-tls
- key: collector-cert.pem
- hostPath:
path: /
type: ""
name: host-root-ro - emptyDir: {}
name: etc-ssl - emptyDir: {}
name: etc-pki-volume
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberMisscheduled: 0
numberReady: 0
numberUnavailable: 3
observedGeneration: 4
updatedNumberScheduled: 3`
- env:
@Balaji-MP any chance to do kubectl describe ds collector
to get the events as well?
@erthalion here is the events, current state of the pod is CrashLoopBackOff
`Events:
Type Reason Age From Message
Normal SuccessfulCreate 10s daemonset-controller Created pod: collector-jst65
Normal SuccessfulCreate 3s daemonset-controller Created pod: collector-x86fj`
@erthalion I guess, the permission issue is caused because of the eval in line 94. I might be wrong, any thoughts on this ?
bootstrap.sh
(including the eval
part) is only responsible for starting Collector. The issue you observe is happening when Collector tries to load eBPF probes.
@erthalion any thoughts on this one ?
What happens if you remove this part from the security context?
seLinuxOptions:
type: container_runtime_t
same error and nothing changed.
@Balaji-MP what about the SCC, you haven't posted it yet, can you show scc/stackrox-collector
?
@erthalion here is the security context in stackrox-collector
securityContext: runAsUser: 1000 runAsGroup: 3000 fsGroup: 2000 containers:
@erthalion here is the security context in stackrox-collector
securityContext: runAsUser: 1000 runAsGroup: 3000 fsGroup: 2000 containers:
There is also a SecurityContextConstraints (SCC), which should have more information, e.g. if a privileged containers are allowed and similar. Having said that, can you describe more your Openshift setup, is there anything special?
@erthalion here is the SCC applied for this collector
`runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
seccompProfiles:
- '*'
supplementalGroups:
type: RunAsAny`
My cluster is standard and no additional restriction are in place.
@erthalion can you please share the directory location where the collector will create the map ??
@erthalion can you please share the directory location where the collector will create the map ??
It's a BPF map, so it's not located on the filesystem. The problem here is your Openshift setup somehow prevent Collector from executing the bpf
syscall, we need to find out why is that.
here is the SCC applied for this collector
runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny seccompProfiles: '*' supplementalGroups: type: RunAsAny
This doesn't look complete, isn't there anything saying something like below?
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
@erthalion no I don't see anything related to allowPriviledged escalation / container.
no I don't see anything related to allowPriviledged escalation / container.
That sounds strange to me. So the output of oc get scc/stackrox-collector -o yaml
doesn't show anything else except what you've posted?
Yes, that's correct
@stackrox/collector-team any updates on this issue?
Unfortunately no, nobody had a capacity to look further into it.
@Balaji-MP TBH Openshift 4.9 is quite dated... might even be out of support? Would it be feasible for you to upgrade to a more recent version?
@porridge let me update to the latest version and can check. In the mean time, do you have a recommended version or above ?
4.12 would be my first choice
Awesome! Let us know if you need anything else.