microsoft/Docker-Provider

OMS Agent high memory usage

MrImpossibru opened this issue · 39 comments

Hello,

Is there any way to reduce omsagent memory consumption in the kubernetes cluster? Just for 2 nodes it runs 3 instances of omsagent (1 daemon set - 2 instances, 1 replica set - 1 instance) and each instance uses 300mb of ram. This is the most demanding service in my cluster and it is just a monitoring tool.

Why is replicaset even required? Just adds one more instance to the node where daemonset already created one.

Reopening because of last Issue was closed without a solution. (With a solution for the problem what happened later than original issue was created)
#694

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

Hi, @ganga1980,
Is it possible to let first/any/all daemonset pod(s) to do that?
Is it possible to prevent omsagent to reserve so big ram amount?

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@ganga1980
Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@ganga1980 Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

In the current semester, we have plan to integrate the vertical pod auto-scalar for scaling of both requests and limits. With that, you will have requests and limits will be 20MB and this should address your ask. Regarding the perf, we are continue improving on it and this will be ongoing improvements. Let me know if you have any other follow up questions and otherwise we can close.

@ganga1980 only after this is fixed I will close the topic.
And the main problem is in the first message - omsagent uses too much memory + there're many instances of it per cluster. What makes 4gb ram VMs in the cluster almost unusable - default pods are using almost all Ram in such VMs.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@MrImpossibru, as I mentioned earlier Perf improvements will be ongoing. since you are referring perf usage of 4GB across all the default pods which are owned by different teams which needs to be followed up separately.

@ganga1980 OMS Agent is the most memory consuming service.
And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@ganga1980 OMS Agent is the most memory consuming service. And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

@MrImpossibru , Memory consumption not based on the node's available memory rather its based on data collection. We use open source products like Telegraf, Fluent-bit and Fluentd etc. in our agent and these processes are consuming this memory. For example, if the cluster has lot of the k8s resources, then replicaset pod has to fetch and parse, and these resources data similarly if the node has high volume of the container logs then daemonset consumes the required memory for container logs processing. If you are available, lets have quick discussion on this and see how can we help on this.

Adding an Up for @MrImpossibru , omsagent is by far the largest container running on our AKS too.

Memory consumption not based on the node's available memory rather its based on data collection

Sorry, was wrong. Checked it again using 2 fresh clusters with different node sizes - consumption was almost equal.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@MrImpossibru , We are actively working on investigating to optimizing our agent memory usage. Hopefully, optimizations will help to bring the memory usage significantly. We will post the update once we have some concrete data.

@ganga1980 Thanks a lot!

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue was closed because it has been stalled for 12 days with no activity.