dun/munge

Error: Invalid Credential

ajflor0 opened this issue · 4 comments

I am currently setting up a SLURM multi-cluster setup, with one cluster on-premises and one in the cloud. For some reason, I am not able to authenticate Munge-encoded messages sent between the two. I have ensured the UID and GID of the munge, slurm and other users are the same, as well as copied the Munge key from the head cluster to the other via SSH. The clocks are synchronized on both. But still, when I use the "echo foo | ssh user@server munge | unmunge" command, it gives me a response of "unmunge: error: invalid credential." What else could possibly be going wrong?

dun commented

Did you restart munged after having copied the key? It sounds like one of the munged processes was started with an older key. The keyfile is only read at startup.

I just restarted munge with systemctl on the cloud slurmctld, and I am still running into the same problem. For some reason it is also not letting me activate the munged daemon. Before I could restart munge, I had to change a bunch of files' permissions and owners from root to munge, and to activate the munged daemon (using sudo, anyway) it wants them owned by root again, and will not give me permission to simply use the regular command. What do you advise for this?

dun commented

If you're unable to start munged via systemctl, check the systemd journal for errors:
sudo journalctl -xe | grep munged

I recommend munged should be run as a dedicated non-privileged user (typically munge). The munge.service file runs munged as the munge user. But if you're trying to start munged from the command-line, you'll need to manually change to the munge user:
sudo -u munge /usr/sbin/munged

Restarting both daemons with systemctl worked, thank you. The /var/log/munge/munged.log file was my guide to finding and correcting the errors with Munge at startup.