drmaa library threads and munge: Invalid Credential Format
Closed this issue · 1 comments
submitting a job to a SGE 8.1.9 Cluster from an application called galaxy logged the following error:
galaxy.jobs.runners.drmaa WARNING 2019-02-13 13:20:43,111 (427) drmaa.Session.runJob() failed, will retry: code 17: MUNGE authentication failed: Invalid credential format
I have verified UIDs and GIDs across host and cluster are alike as well as verified perms for munge dirs and files match install docs. Also verfied munge.key matched across cluster and host.
Used this python script outside of galaxy to submit job to cluster with success:
import drmaa
from multiprocessing.pool import ThreadPool
import tempfile
import os
import stat
session = drmaa.Session()
session.initialize()
def main():
smt = "ls . > test.out"
script_file = tempfile.NamedTemporaryFile(mode="w", dir=os.getcwd(), delete=False)
script_file.write(smt)
script_file.close()
print "Job is in file %s" % script_file.name
os.chmod(script_file.name, stat.S_IRWXG | stat.S_IRWXU)
jt = session.createJobTemplate()
print "jt created"
jt.jobEnvironment = {'BASH_ENV': '~/.bashrc'}
print "environment set"
jt.remoteCommand = os.path.join(os.getcwd(),script_file.name)
print "remote command set"
jobid = session.runJob(jt)
print "Job submitted with id: %s, waiting ..." % jobid
retval = session.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
if name=='main':
main()
WHEN I try this same script with Python Multithreading, I get error Script and error are:
import drmaa
from multiprocessing.pool import ThreadPool
import tempfile
import os
import stat
pool = ThreadPool(1)
session = drmaa.Session()
session.initialize()
def pTask(n):
smt = "ls . > test.out"
script_file = tempfile.NamedTemporaryFile(mode="w", dir=os.getcwd(), delete=False)
script_file.write(smt)
script_file.close()
print "Job is in file %s" % script_file.name
os.chmod(script_file.name, stat.S_IRWXG | stat.S_IRWXU)
jt = session.createJobTemplate()
print "jt created"
jt.jobEnvironment = {'BASH_ENV': '~/.bashrc'}
print "environment set"
jt.remoteCommand = os.path.join(os.getcwd(),script_file.name)
print "remote command set"
jobid = session.runJob(jt)
print "Job submitted with id: %s, waiting ..." % jobid
retval = session.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
pool.map(pTask, (1,))
Result is:
Job is in file /home/svc-clingalprod/tmpu3A6Rk
jt created
environment set
error: getting configuration: MUNGE authentication failed: Invalid credential format
remote command set
Traceback (most recent call last):
File "remote_mthread.py", line 29, in
pool.map(pTask, (1,))
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
drmaa.errors.DeniedByDrmException: code 17: MUNGE authentication failed: Invalid credential format
Where do I go from here in isolating the cause of the Invalid Credential format error?
The "Invalid credential format" error is an EMUNGE_BAD_CRED
. If you look at dec.c, you'll see this during credential decoding when a value has been truncated. If the application had used munge_ctx_strerror() instead of munge_strerror(), you would have a more detailed error message describing which value had been truncated.
You'll need to determine where the truncation is taking place. Is it happening before or after the credential text has been sent to the remote host? What does the credential text look like on the encoding host? What does it look like on the decoding (remote) host once it is received, but before it is passed to munge_decode()
? Does it start with MUNGE:
and end with a :
?