Issues
- 2
dcgm-exporter can't run
#212 opened by JohanOu - 0
nvidia-smi to report PCIe utilization %
#215 opened by amrragab8080 - 0
How to get pod level GPU metrics
#214 opened by faheemsohail - 5
How to monitor occupancy per SM.
#196 opened by malixian - 29
dcgm-exporter pod is crashingoff
#161 opened by anaconda2196 - 0
- 2
- 1
dcgm-exporter running on "g4dn.metal" in AWS EKS fails with "fatal: morestack on gsignal"
#208 opened by SQUIDwarrior - 6
- 8
dcgm-exporter cannnot installed successfully on 2080Ti
#176 opened by ReyRen - 0
- 14
too many warnings and errors
#146 opened by jelmd - 5
DCGM exporter crashes when installed by helm3
#180 opened by jiangxiaosheng - 0
does this repository support the windows nvidia gpu?
#207 opened by flyysr - 0
- 0
- 3
Failed to make binary
#202 opened by sunhmy - 0
failed to make binary
#201 opened by sunhmy - 2
- 0
- 0
Log spam in nv-hostengine.log due to ReadNvSwitchStatusAllSwitches() returned No data is available
#194 opened by jfolz - 5
dcgm-exporter missing metrics for A100 GPU
#166 opened by anaconda2196 - 7
GPU_I_PROFILE="<<<NULL>>>"
#193 opened by munir-georges - 3
- 1
How to monitor multiple GPU servers
#181 opened by anilnokia - 0
- 4
Install broken on AKS
#167 opened by RaananHadar - 1
- 1
- 1
invalid metrics in 2.4.0rc2
#187 opened by juliantaylor - 0
- 3
GKE: access DCGM metrics from HPA
#179 opened by JulesBelveze - 1
How do I mount a custom csv file with Kubernetes ?
#169 opened by dmrub - 1
need `dcgmGetValuesSince` function
#170 opened by qisikai - 1
- 0
- 0
- 0
Pod run as non root user
#171 opened by anaconda2196 - 3
what is the problem of API version mismatch
#165 opened by kentinchen - 4
Helm chart pointing to 2.3.1 container, which is not available on nvcr.io
#164 opened by francoishernandez - 13
Method of calculating GPU utilization when applying NVIDIA Multi-Instance GPU
#151 opened by Jea-Eok-Kim - 2
GPU with MIG instances
#163 opened by crinavar - 1
Error: failed to download "gpu-helm-charts/dcgm-exporter" (hint: running `helm repo update` may help)
#158 opened by zkf85 - 0
make for specific DCGM version?
#155 opened by biocyberman - 2
ARM64 support
#147 opened by danmx - 1
Need help, trapped in the downloading DCGM
#153 opened by yangfly - 0
DCGM_FI_DEV_GPU_UTIL Abnormal Output
#152 opened by Jea-Eok-Kim - 4
Questions about EventType, EventData, and Xid
#150 opened by ruiwen-zhao - 1
- 1