NVIDIA/gpu-monitoring-tools

Install broken on AKS

RaananHadar opened this issue · 4 comments

I am getting the following error when trying to install the helm chart on a standard AKS cluster:

Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
helm.go:81: [debug] unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

Hi @RaananHadar - have you installed the Prometheus (kube-prometheus) operator? The Prometheus operator provides the ServiceMonitor CRD, so without that, the dcgm-exporter pod will fail with the above error.

I will update our documentation to make this explicit.

Thanks for the answer @dualvtable.

I've installed prometheus using a very popular helm chart from grafana called loki-stack which installs grafana+prometheus+loki. It seems to depend on the community prometheus helm chart found here. This is the documented way to install prometheus that grafana recommends so I expect alot of users to follow this route as well.

I would appreciate if you suggest a workaround for people who install prometheus this way. Many thanks.

@RaananHadar Maybe it should be separate issue. Loki support
I also installed loki-stack and I also faced the same problem.

Or rename this one please

@sturfee-petrl, As noted by @dualvtable the issue is that installation is only possible with a crd from 'kube-prometheus'. I don't think that renaming the issue to 'loki support' is accurate. Maybe 'community prometheus' is more likely.