Cannot run job with prologue/epilogue
Opened this issue · 0 comments
brunotrbr commented
Hi
I'm trying to run jobs with prologue/epilogue script, but it's not working. When I remove the scripts, all jobs run (interactive or batch).
I'm using Torque 6.0.4 and the qsub command is "qsub -I -V -l nodes=2"
On torque server i receive the error message
"execv of /var/spool/torque/mom_priv/prologue failed: No such file or directory"
but there is prologue and epilogue scripts on nodes.
On syslog I receive the error message :
May 29 20:42:37 node2 pbs_mom: LOG_DEBUG::run_pelog, running prolog script '/var/spool/torque/mom_priv/prologue' for job 112
May 29 20:42:37 node2 pbs_mom: LOG_ERROR::pelog_err, prolog/epilog failed, file: /var/spool/torque/mom_priv/prologue, exit: 255, nonzero p/e exit status
May 29 20:42:37 node2 pbs_mom: LOG_ERROR::handle_prologs, prolog failed
In mom_logs I receive the error message :
05/29/2020 20:42:37.411;02; pbs_mom.2688;n/a;mom_close_poll;entered
05/29/2020 20:42:37.440;01; pbs_mom.2541;Job;112;task/session info loaded
05/29/2020 20:42:37.440;01; pbs_mom.2541;Job;TMomFinalizeJob3;Job 112 read start return code=-2 session=2688
05/29/2020 20:42:37.440;01; pbs_mom.2541;Job;TMomFinalizeJob3;job not started, Failure job exec failure, after files staged, no retry (see syslog for more information)
05/29/2020 20:42:37.440;01; pbs_mom.2541;Job;exec_job_on_ms;ALERT: job failed phase 3 start - jobid 112
The files permission are below:
-rw-r--r-- 1 root root 13 May 29 20:25 config
-r-x------ 1 root root 344 May 29 20:38 epilogue
-r-x---r-x 1 root root 271 May 28 22:44 epilogue.user
drwxr-x--x 2 root root 4096 May 29 20:42 jobs
-rw-r--r-- 1 root root 6 May 29 20:26 mom.lock
-r-x------ 1 root root 269 May 29 20:38 prologue
-r-x---r-x 1 root root 122 May 28 22:43 prologue.user
Could someone help me, please?
Thanks
Bruno Bragança Mendes