Failed to start Nvidia GPU Exporter
Closed this issue · 2 comments
I downloaded the latest version 1.2.1 and installed it on my Ubuntu 20.04. I followed the INSTALL instruction and run it as a systemd service. Now I see the service always failed.
$ sudo systemctl status nvidia_gpu_expoter.service
● nvidia_gpu_expoter.service - Nvidia GPU Exporter
Loaded: loaded (/etc/systemd/system/nvidia_gpu_expoter.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2024-11-04 18:32:35 PST; 4min 11s ago
Process: 3741675 ExecStart=/usr/bin/nvidia_gpu_exporter (code=exited, status=217/USER)
Main PID: 3741675 (code=exited, status=217/USER)
Nov 04 18:32:35 u116594 systemd[1]: nvidia_gpu_expoter.service: Scheduled restart job, restart counter is at 5.
Nov 04 18:32:35 u116594 systemd[1]: Stopped Nvidia GPU Exporter.
Nov 04 18:32:35 u116594 systemd[1]: nvidia_gpu_expoter.service: Start request repeated too quickly.
Nov 04 18:32:35 u116594 systemd[1]: nvidia_gpu_expoter.service: Failed with result 'exit-code'.
Nov 04 18:32:35 u116594 systemd[1]: Failed to start Nvidia GPU Exporter.
I can run it from the terminal and be able to see the metrics output via localhost:9835.
$ sudo nvidia_gpu_exporter
ts=2024-11-05T02:38:09.902Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9835
ts=2024-11-05T02:38:09.902Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9835
Anybody please point out what I did wrong. Thanks.
RCA: The Linux group is missing.
groupadd nvidia_gpu_exporter
useradd -r -g nvidia_gpu_exporter nvidia_gpu_exporter -s /bin/false || true
chown nvidia_gpu_exporter:nvidia_gpu_exporter /usr/bin/nvidia_gpu_exporter
FYI: You can have a try, missing Linux group is what I met before
@changdanyang thanks for finding it out :), I added a new step in between to document this: https://github.com/utkuozdemir/nvidia_gpu_exporter/blob/master/INSTALL.md#installing-as-a-linux-systemd-service