`sacct` does not work on `ondemand` with cc-slurm 3.x
Closed this issue · 1 comments
Version
1.0.40
In what area(s)?
/area administration
/area ansible
/area autoscaling
/area configuration
/area cyclecloud
/area documentation
/area image
/area job-scheduling
/area monitoring
/area ood
/area remote-visualization
/area user-management
Expected Behavior
sacct
should work on the ondemand
node
Actual Behavior
$ sacct
sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
Steps to Reproduce the Problem
install az-hop with cc-slurm 3.x and slurm 23.x
Solution
The problem is that /anfhome/slurm/config/accounting.conf
is configured to point to localhost
:
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost="localhost"
AccountingStorageTRES=gres/gpu
However, slurmdbd
only runs on the scheduler node (sacct
works fine there).
To fix, change localhost
to {{ scheduler.name }}
from the config file.
(there used to be logic for this in the slurm.conf.j2
template, but it seems this is no longer used with cc-slurm 3.x)
I've open a bug in CC Azure/cyclecloud-slurm#215
Working on a workaround