GPU metrics not collected by aws-cloudwatch-metrics
Opened this issue · 0 comments
Describe the bug
I've setup the aws-cloudwatch-metrics through the helm chart linked here, I've also set the image.tag=1.300037.0b583
, because it seems that the GPU metrics should be collected by default starting from 1.300034.0
according to this link.
Also the RBAC permissions have been manually updated to include services: #1095 as well as I've explicitly set enhancedContainerInsights.enabled=true
(and fixed the documentation for this value here).
I still can't see the metrics in ContainerInsights and I start to believe, that I have to add additional settings to the ConfigMap to explicitly enable the GPU metrics collection. Can someone confirm this, or should GPU metrics collection would out of the box?
Steps to reproduce
Install aws-cloudwatch-metrics on a EKS cluster with GPU nodes (e.g. g5.xlarge). Check CloudWatch for GPU metrics.
Expected outcome
I'd expect the GPU metrics to show up in CloudWatch
Environment
- Chart name: aws-cloudwatch-metrics
- Chart version: 0.0.11
- Kubernetes version: 1.29.3-eks-adc7111
- Using EKS (yes/no), if so version? 1.29.3-eks-adc7111
Additional Context:
I've successfully set up the metrics collection for GPU metrics on EC2 instances before, but it doesn't seem to work on EKS using this chart.