DataDog/ddprof

Run ddprof as a sidecar container

r1viollet opened this issue ยท 10 comments

Description

Allow ddprof to be deployed as a side container to monitor the activity of processes running on the host.

Context

This was a request within the following discussion
#212

Within this ticket we should define the expected user experience

Although I mentioned the sidecar container is the previous post, the term sidecar was used to distinguish in ddprof running in the same container vs ddprof running in another container, side by side with the monitoring process.

However, the ideal scenario for ddprof is the following:

  1. Execute ddprof in a "privileged" (not sure if privileged is required of we could just make it work adding CAP_SYSADMIN, SYSLOG capabilities).
  2. Filter which processes ddprof would be able to monitor, using for example kubernetes annotations.

This deployment has a number of advantages:

  1. ddprof is the only process that runs in privileged mode so one can try to secure it better.
  2. it is able to choose which pid to monitor using kubernetes mechanisms, so it can start doing continuous profiling as new pods are created.
  3. no changes are required from the application container creation and deployment (perhaps some minor tuning so to include debugging symbols; not sure if ddprof supports debuginfo servers).

Thanks for the feedback, I established an example here to help us iterate on a side-container deployment.

Keep in mind this is not the current way we advocate using ddprof and I would need to get more customer requests to work further on this. Allocation profiling for instance currently does not work with this way of deploying the profiler.

However I'd be happy if you can give it a try and explain what is missing to make this usable for your use case.

This is for our first iteration of the deployment of ddprof on our k8s cluster:

  1. we created a daemonset that executed ddprof as a container in all our nodes, in order to monitor the process running within the host.
  2. ddprof was executed using the command: ddprof -l notice -g yes which sampled the whole host.

The profiles seems to work but there seems to be some questions on the unwinding. For example what does this anonymous means?
image

The other issue using this mode is that is hard to go from the processes back to the pod names. Is there a way that we can annotate the profiles that we collect with their containerID (at least. Best option is to have the pod name) so that we are able to go from logs/metrics to the corresponding profiles?
Finally another question. In the flamegraph is we have 2 processes running with the same name (although different pids) are they aggregated in the same column? I haven't dig into this, but I got the feeling from the profiles that we used.

๐Ÿ‘‹ Thanks for the feedback.

The anonymous frames

  • It seems the profiler is not matching the debug information for your libc. This is not too important as the base frames are irrelevant, though this seems like a bug. I might be incidentally working on fixing this. I'll give you a new release soon to check if we are fixing this.

PID Aggregation question

  • Yes, the aggregation will group processes with different PIDs by default.
    You have a column on the right where you can split the data per PID / TID.

Split the data per POD

  • I'll see how I can add that in. This is important.

Could you try the main branch with container-id information ? This will allow you to understand which containers are responsible for different parts of the flamegraph. There is a side panel on the right that will allow you to filter on containers.

The release is available here : https://github.com/DataDog/ddprof/releases/tag/v0.13.0-rc
If this is OK, I can release it.

Right now we are deploying ddprof inside every container so we sample only the containers we want. (we run ddprof with adding the pid inside the pod of the process we want to profile).
However, if you want I can deploy ddprof as daemonset in our cluster and see how this works and report it back to you. (In the long run we would like to run it like a daemonset, rather than per pod as we do know).

That would be great! My question is basically if the current container-id information is enough for you to slice the profiles in the relevant dimensions.

This was released. You now have a container-id filter in the side panel on the right hand side.
image

This ID can be used in the infrastructure view to match the relevant container.

The next item you discussed was PID filtering. Is this still something required ?
The UI filtering mechanism already provides a way to do this.

Latest release v0.17.1 greatly improves the performances of the sidecar approach. I would recommend upgrading.