open-telemetry/opentelemetry-network

Add support for Openshift

atoulme opened this issue · 13 comments

Is your feature request related to a problem? Please describe.

We currently do not support well OpenShift.

Describe the solution you'd like

We want to support OpenShift as a first class supported platform.

Describe alternatives you've considered

No response

Additional context

No response

Hi @atoulme what is missing for supporting OpenShift?

@samiura has worked on adding OpenShift support with #201

For now, we have reverted the changes as they increase the size of the Docker image. Please take a look at the approach he took. We are looking at how to add those kernel headers properly, maybe be sidecar volume mount.

@atoulme nice!

We are looking at how to add those kernel headers properly, maybe be sidecar volume mount.

I'll be some days off but, after coming back, I would like to give a hand to make this happen :)
Thanks!

That would definitely help. We are probably going to cancel the SIG meeting next week, but the week after we can talk if you'd like, or work over this issue directly.

@iblancasa Hey Israel! Thanks for your interest in ppentelemetry-ebpf. In short these are the things we had to do to make opentelemetry-ebpf function normally like in other K8s.

  1. First and foremost, because RHCOS operating system running on each node of openshift cluster by default is SELinux enabled, the spc SELinux policy (see spc_selinux man page) needs to be modified to allow additional access to spc_t domain processes (Super Privileged Containers). Details step by step instructions are list here.

  2. This is the most tricky part where we need help as current solution is bit adhoc., We had to copy Linux headers ( we need kernel-devel package installed) which are available on Openshift's Toolkit images into our Kernel-Collector docker image. This is bit inconvenient since it bloats up the container size almost by 10% for each version of Openshift cluster. For other K8s running, we use DNF/APT package handlers to install headers in real time which is not possible in Openshift due to stringent security architecture.

  3. Thirdly, we have to run Kernel Collector and K8s collector containers with added privileges. The following can achieve the goals, however these two steps can be achieved using via securityContext configurations.

    oc adm policy add-scc-to-user privileged -z my-splunk-otel-collector-kernel-collector -n

    oc adm policy add-scc-to-user anyuid -z my-splunk-otel-collector-k8s-collector -n

@iblancasa do you have any advice on this, is there anything we should do differently?

Sorry for the late reply. I was on vacation and I just "landed".

  1. First and foremost, because RHCOS operating system running on each node of openshift cluster by default is SELinux enabled, the spc SELinux policy (see spc_selinux man page) needs to be modified to allow additional access to spc_t domain processes (Super Privileged Containers). Details step by step instructions are list here.

I see. This is challenging because not everyone has access to the workers where OpenShift is running. Maybe this operator can help.

  1. This is the most tricky part where we need help as current solution is bit adhoc., We had to copy Linux headers ( we need kernel-devel package installed) which are available on Openshift's Toolkit images into our Kernel-Collector docker image. This is bit inconvenient since it bloats up the container size almost by 10% for each version of Openshift cluster. For other K8s running, we use DNF/APT package handlers to install headers in real time which is not possible in Openshift due to stringent security architecture.

I can take a look but I think that would be the only way to support OpenShift. How about having two images? One for OpenShift and another one for regular Kubernetes clusters. With that, only the OpenShift users will "suffer" the increase in the image size.

  1. Thirdly, we have to run Kernel Collector and K8s collector containers with added privileges. The following can achieve the goals, however these two steps can be achieved using via securityContext configurations.
    oc adm policy add-scc-to-user privileged -z my-splunk-otel-collector-kernel-collector -n
    oc adm policy add-scc-to-user anyuid -z my-splunk-otel-collector-k8s-collector -n

I think this is OK. For some stuff we have in the OTEL Collector, we need to create SA and add security contexts and other stuff to make it work.

Thanks @iblancasa . I agree with your (2) 100%. We can have two different sets of images of Kernel Collectors.

I will have to investigate and research (1) to understand the operator approach as I have not looked into it.

Hello @iblancasa for (1) would you have an example of this operator? Looking at the docs, they seem quite complex to parse and I would love a way to get started.

@atoulme I'll try yo have something ASAP. I have never use the operator before but I know some people are using it for these kind of purposes

I'll assign the issue to you for the time being, so we know to get back to you to discuss. Thanks for your help.

@iblancasa anything we should follow up on?

@atoulme I have spent some time on this but I was not able to make it work.

Maybe just mentioning the need to modify the SELinux policy can be ok for now.