adaptivecomputing/torque

Cannot run job with prologue/epilogue

Opened this issue · 0 comments

Hi

I'm trying to run jobs with prologue/epilogue script, but it's not working. When I remove the scripts, all jobs run (interactive or batch).

I'm using Torque 6.0.4 and the qsub command is "qsub -I -V -l nodes=2"

On torque server i receive the error message

"execv of /var/spool/torque/mom_priv/prologue failed: No such file or directory"

but there is prologue and epilogue scripts on nodes.

On syslog I receive the error message :

May 29 20:42:37 node2 pbs_mom: LOG_DEBUG::run_pelog, running prolog script '/var/spool/torque/mom_priv/prologue' for job 112
May 29 20:42:37 node2 pbs_mom: LOG_ERROR::pelog_err, prolog/epilog failed, file: /var/spool/torque/mom_priv/prologue, exit: 255, nonzero p/e exit status
May 29 20:42:37 node2 pbs_mom: LOG_ERROR::handle_prologs, prolog failed

In mom_logs I receive the error message :

05/29/2020 20:42:37.411;02;   pbs_mom.2688;n/a;mom_close_poll;entered
05/29/2020 20:42:37.440;01;   pbs_mom.2541;Job;112;task/session info loaded
05/29/2020 20:42:37.440;01;   pbs_mom.2541;Job;TMomFinalizeJob3;Job 112 read start return code=-2 session=2688
05/29/2020 20:42:37.440;01;   pbs_mom.2541;Job;TMomFinalizeJob3;job not started, Failure job exec failure, after files staged, no retry (see syslog for more information)
05/29/2020 20:42:37.440;01;   pbs_mom.2541;Job;exec_job_on_ms;ALERT:  job failed phase 3 start - jobid 112

The files permission are below:

-rw-r--r-- 1 root root   13 May 29 20:25 config
-r-x------ 1 root root  344 May 29 20:38 epilogue
-r-x---r-x 1 root root  271 May 28 22:44 epilogue.user
drwxr-x--x 2 root root 4096 May 29 20:42 jobs
-rw-r--r-- 1 root root    6 May 29 20:26 mom.lock
-r-x------ 1 root root  269 May 29 20:38 prologue
-r-x---r-x 1 root root  122 May 28 22:43 prologue.user

Could someone help me, please?

Thanks

Bruno Bragança Mendes