use daemonset way not sidecar way
withlin opened this issue · 6 comments
Feature request
use daemonset way not sidecar way
Use case
May be consider daemonset that it will deploy agent on nodes,which it will collect data by pid, and the use pid association with pod metadata.
what do you think? @edeNFed
Sounds like a really interesting feature @withlin
Regarding implementation, I think we should consider changing the current logic of specifying the target binary via OTEL_TARGET_EXE
to logic like:
- Scan all processes in /proc
- Detect all processes that are written in Go. we already have this code at https://github.com/keyval-dev/opentelemetry-go-instrumentation/blob/af32862e111b42be049956705d0dd27e9639fc15/pkg/process/module.go#L40
- Apply instrumentation for every found Go process.
I think changing the behavior to this will allow us to both remove the OTEL_TARGET_EXE
config and support daemon set deployment at the same time. Basically, when deployed as a daemon set with hostPID: true
/proc will simply contain more processes and the instrumentation agent does not even aware of the fact that it is currently deployed as a daemon set.
What do you think?
Sounds like a really interesting feature @withlin Regarding implementation, I think we should consider changing the current logic of specifying the target binary via
OTEL_TARGET_EXE
to logic like:
- Scan all processes in /proc
- Detect all processes that are written in Go. we already have this code at https://github.com/keyval-dev/opentelemetry-go-instrumentation/blob/af32862e111b42be049956705d0dd27e9639fc15/pkg/process/module.go#L40
- Apply instrumentation for every found Go process.
I think changing the behavior to this will allow us to both remove the
OTEL_TARGET_EXE
config and support daemon set deployment at the same time. Basically, when deployed as a daemon set withhostPID: true
/proc will simply contain more processes and the instrumentation agent does not even aware of the fact that it is currently deployed as a daemon set. What do you think?
if you scan all processes, may be have problem. for instance: i want to collect specific process,such as only collect voting service data.
hi @edeNFed , i have a idea as follow:
- the agent should use k8s informer way that it will watch pod resource, the pod should have annotations: such as
keyval.dev/go-instrumentation: true
,then the current agent get current node info,which it can fliter node in pod resource mainifest. if match it will do 2 step. - the agent use
docker
grpc client/containerd
grpc client get pid. the command line:docker inspect -f '{{.State.Pid}}' <container id>
- to do golang instrument
That sounds really cool. I wonder if we should integrate somehow with the OpenTelemetry Operator: https://github.com/open-telemetry/opentelemetry-operator#opentelemetry-auto-instrumentation-injection
It looks like they are doing similar things, for example, annotating deployments for instrumentation.
what do you think?
@edeNFed Yes, this project that sound good to me.opentelemetry-auto-instrumentation-injection is using mutating webhook intercept by annotation(inject container in pod resource).
the implement as follows:
https://github.com/open-telemetry/opentelemetry-operator/blob/main/internal/webhookhandler/webhookhandler.go#L92
https://github.com/open-telemetry/opentelemetry-operator/blob/main/pkg/sidecar/podmutator.go#L57
https://github.com/open-telemetry/opentelemetry-operator/blob/main/pkg/sidecar/pod.go#L37
before you send to me for this project,i have no idea for it. use daemonset way may reduce overhead vs sidecar,that's why I recommend implementing this way.
@withlin After reading your comments again, I think I understand what you mean much better now.
I really like this solution, sounds elegant and I can see many advantages to it (for example we could even instrument Kubernetes internal components like kubelet).
Another thing we probably need to implement as part of this is how to stop instrumenting when the annotation is deleted, It should probably delete all the attached uprobes.
I like it. Would you like to try to implement it? I can also get to it in about two weeks.