GPU device not detected with nvidia driver > 430.XX
ptonelli opened this issue · 1 comments
ptonelli commented
When running with 450.XX or 460.XX drivers, the logs of the pod are:
gpumanager.go:28] Loading NVML
gpumanager.go:31] Failed to initialize NVML: could not load NVML library.
gpumanager.go:32] If this is a GPU node, did you set the docker default runtime to `nvidia`?
The nvidia driver is running correctly on the machine as nvidia-smi show the gpu.
We are currently trying to update the dependancies of the project and rebuilding the device plugin but have failed to solve the issue.
ptonelli commented
by lowering the linux kernel image version from 5.10 to 4.18, it solved the issue.