fgci-org/ansible-role-cuda

load /dev/nvidia0 before slurm role runs

Closed this issue · 1 comments

A1ve5 commented

after reinstall ansible-pull fails because it can't find /dev/nvidia0. nvidia-smi seems to "load" it.
Figure out a way to have /dev/nvidia0 available for slurm role

OK. This is done and slurm now successfully starts on io's GPU node on reboots.
Thanks @dgtim for the commands.

Related I found a service called nvidia-persistenced.service - but enabling that and modifying its service file to have --persistence-mode did not make /dev/nvidia* devices appear.