utkuozdemir/nvidia_gpu_exporter

most ratio metrics are zeroes

pschonmann opened this issue · 3 comments

Describe the bug
Some ratio metrics are zeroes

obrazek

Expected behavior
Just show me graphs with values, not zeroes

!UUID STRING WAS REPLACED!

Console output
Some metrics are zeroes like
nvidia_smi_fan_speed_ratio{uuid="MY_UUID_STRING"} 0.3
nvidia_smi_utilization_decoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_encoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_gpu_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_jpeg_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_memory_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_ofa_ratio{uuid="MY_UUID_STRING"} 0

Model and Version

  • GPU Model: NVIDIA RTX A6000
  • App version and architecture: 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
  • Installation method binary
  • Operating System Debian 12
  • Nvidia GPU driver version - ii nvidia-driver 545.23.06-1 amd64 NVIDIA metapackage

Additional context

Fri Dec 29 13:09:22 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:00:10.0 Off |                  Off |
| 30%   33C    P8              20W / 300W |  47383MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2375961      C   /opt/conda/bin/python3.9                  47374MiB |
+---------------------------------------------------------------------------------------+

@pschonmann In my case it solved there jina-ai/clip-as-service#254
(just run sudo nvidia-smi -pm 1 in the instance )

@pschonmann In my case it solved there jina-ai/clip-as-service#254 (just run sudo nvidia-smi -pm 1 in the instance )

It doestn help, because is already enabled

Persistence mode is already Enabled for GPU 00000000:01:00.0.
Persistence mode is already Enabled for GPU 00000000:02:00.0.
Persistence mode is already Enabled for GPU 00000000:03:00.0.
Persistence mode is already Enabled for GPU 00000000:04:00.0.