NVIDIA/gpu-monitoring-tools

Failed to install gpu-helm-charts/dcgm-exporter

jasperzhong opened this issue · 1 comments

I follow the instructions here. Everything went well before failing to install gpu-helm-charts/dcgm-exporter.

~ helm install \
   --generate-name \
   gpu-helm-charts/dcgm-exporter
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

I tried a few times and even rebooted the system. But I always got the error. I have no idea and ask for your help.

More information:

  • OS: ubuntu 18.04lts
  • Docker: nvidia-docker2 19.03.13, API version 1.40.
  • Kubernetes: 1.19
  • Helm: 3.4.1

I solved it myself. It turns out that I have to set up prometheus first.

https://docs.nvidia.com/datacenter/cloud-native/kubernetes/dcgme2e.html#setting-up-prometheus