most ratio metrics are zeroes
pschonmann opened this issue · 3 comments
Describe the bug
Some ratio metrics are zeroes
Expected behavior
Just show me graphs with values, not zeroes
!UUID STRING WAS REPLACED!
Console output
Some metrics are zeroes like
nvidia_smi_fan_speed_ratio{uuid="MY_UUID_STRING"} 0.3
nvidia_smi_utilization_decoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_encoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_gpu_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_jpeg_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_memory_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_ofa_ratio{uuid="MY_UUID_STRING"} 0
Model and Version
- GPU Model: NVIDIA RTX A6000
- App version and architecture: 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
- Installation method binary
- Operating System Debian 12
- Nvidia GPU driver version - ii nvidia-driver 545.23.06-1 amd64 NVIDIA metapackage
Additional context
Fri Dec 29 13:09:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:00:10.0 Off | Off |
| 30% 33C P8 20W / 300W | 47383MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2375961 C /opt/conda/bin/python3.9 47374MiB |
+---------------------------------------------------------------------------------------+
It was used that dashboard
https://grafana.com/grafana/dashboards/14574-nvidia-gpu-metrics/
@pschonmann In my case it solved there jina-ai/clip-as-service#254
(just run sudo nvidia-smi -pm 1
in the instance )
@pschonmann In my case it solved there jina-ai/clip-as-service#254 (just run
sudo nvidia-smi -pm 1
in the instance )
It doestn help, because is already enabled
Persistence mode is already Enabled for GPU 00000000:01:00.0.
Persistence mode is already Enabled for GPU 00000000:02:00.0.
Persistence mode is already Enabled for GPU 00000000:03:00.0.
Persistence mode is already Enabled for GPU 00000000:04:00.0.