aws/eks-charts

GPU metrics not collected by aws-cloudwatch-metrics

Opened this issue · 0 comments

Describe the bug

I've setup the aws-cloudwatch-metrics through the helm chart linked here, I've also set the image.tag=1.300037.0b583, because it seems that the GPU metrics should be collected by default starting from 1.300034.0 according to this link.

Also the RBAC permissions have been manually updated to include services: #1095 as well as I've explicitly set enhancedContainerInsights.enabled=true (and fixed the documentation for this value here).

I still can't see the metrics in ContainerInsights and I start to believe, that I have to add additional settings to the ConfigMap to explicitly enable the GPU metrics collection. Can someone confirm this, or should GPU metrics collection would out of the box?

Steps to reproduce

Install aws-cloudwatch-metrics on a EKS cluster with GPU nodes (e.g. g5.xlarge). Check CloudWatch for GPU metrics.

Expected outcome

I'd expect the GPU metrics to show up in CloudWatch

Environment

  • Chart name: aws-cloudwatch-metrics
  • Chart version: 0.0.11
  • Kubernetes version: 1.29.3-eks-adc7111
  • Using EKS (yes/no), if so version? 1.29.3-eks-adc7111

Additional Context:

I've successfully set up the metrics collection for GPU metrics on EC2 instances before, but it doesn't seem to work on EKS using this chart.