Issues
- 0
- 0
In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
#392 opened by lddlww - 2
Missing 3.3.8 builds
#389 opened by xnox - 5
Let dcgm-exporter be a daemon
#367 opened by zvonkok - 6
DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.
#385 opened by rohitreddy1698 - 1
DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes
#388 opened by valafon - 14
GPU Failure Detection and Alerting Enhancement
#348 opened by jz543fm - 16
- 0
- 6
- 0
- 3
How does the DCGM exporter work with DCGM?
#383 opened by changhyuni - 1
Add a health status metric for every gpu card
#384 opened by lx1036 - 0
- 4
MIG device support for hpc_job metric labels
#369 opened by jbrobstw - 0
- 2
Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
#379 opened by Vijaygawate - 0
failed to transform metrics for transform 'podMapper'
#378 opened by jicki - 0
How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
#377 opened by yx-lamini - 0
Update contribution doc to require signing
#376 opened by chipzoller - 4
dcp metrics supports gpu architecture
#370 opened by lxzjd - 0
The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
#373 opened by qingfenghcy - 0
time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
#372 opened by safeAndSound3 - 3
enable DCGM_EXPORTER_KUBERNETES and podrequestapi is avaiable but not found container and namespace label in Metrics
#349 opened by Kevinz857 - 11
- 3
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#368 opened by 15234660879 - 0
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
#371 opened by 15234660879 - 6
Seeking community feedback on potential new feature: Standardize labels for next major release
#356 opened by glowkey - 4
Can't collecting DCP metrics
#365 opened by jeffreyyjp - 1
DCGM exporter image vulnerable to https://nvd.nist.gov/vuln/detail/CVE-2024-24790
#364 opened by alexglenn-ddl - 2
dcgm-exporter log: No Kubelet socket, ignoring
#362 opened by jeffreyyjp - 2
Protobuf handling is incorrect
#361 opened by fbacchella - 1
README link about "To integrate DCGM-Exporter with Prometheus and Grafana, see the full instructions in the user guide." is already invalid
#358 opened by jeffreyyjp - 1
dcgm-exporter crashes when run on Debian 12
#360 opened by stevenmcastano - 2
Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `?
#354 opened by koshieguchi - 2
Duplicated, missing or wrong metrics if using MIG, Grafana dashboard showing wrong duplicated / false values
#353 opened by frittentheke - 1
cannot get DCGM_FI_PROF_SM_ACTIVE metrics
#352 opened by qingfenghcy - 4
Cannot Retrieve GPU PIDs from DCGM Metrics
#347 opened by doronkg - 6
How to obtain the namespace , pod and container data
#343 opened by aikikia - 3
DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2
#345 opened by xuchenCN - 6
How to install dcgm-exporter on Windows Server?
#344 opened by LittleNewton - 0
Switch GPU Util metric to `DCGM_FI_PROF_GR_ENGINE_ACTIVE` in NVIDIA DCGM Metrics Dashboard
#341 opened by wabouhamad - 3
- 1
can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??
#339 opened by suxwang - 2
- 2
- 2
Failed to watch metrics: Error watching fields: The third-party Profiling module returned an u
#330 opened by 287400117 - 2
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed
#335 opened by jjziets - 1
Hello, why /var/log/nv-hostengine.log file had many ERROR [5231:5273] [[NvSwitch]] ReadNvSwitchStatusAllSwitches()
#334 opened by 13416157913 - 1
Makefile missing DIST_DIR := cmd/dcgm-exporter
#331 opened by jjziets