aau-claaudia/aicloud

enduser login on compute nodes: Broken pam_slurm_adopt.so

Closed this issue · 3 comments

This module is installed automatically by NVIDIA's deepops, it seems to be broken though - missing some linked libraries and basically breaks all sssd logins. We should be able to fix this, but I think we should re-evaluate if we want end-user logins on the compute node's (job allocation or not)

This doesn't affect the login and controller nodes

We have previously used end-user ssh login for

  1. Copying data to local storage
  2. System investigations of running jobs, top. nvidia-smi, lsof ...

These things can be used as jobs via slurm and the new T4 nodes does not offer that much local storage.

Conclusion: we could go for no end-user ssh login.

I think i managed to unbreak the pam_slurm_adopt issue in 62fba91 but this requires further testing

Fixed by 62fba91