Cannot run on lsf system, not sure if exit code 127 is the reason
Jixuan-Huang opened this issue · 6 comments
Hi,
when I tried to run caper with encode-atac-pipeline, I could see the job was submitted to the lsf system, but the job disappear immediately without any reports.
when I looked into the detailed record of the job, I could find an "exit code 127", and I think maybe the shell script was never been created in the home directory, but I have no ideas how to solve it. Here are the command and the job reports:
(encd-atac) caper hpc submit atac.wdl -i test.F.json --singularity --leader-job-name pipeline.test
2023-07-12 18:37:19,187|caper.hpc|INFO| Running shell command: bsub -W 2880 -M 4G -q ser -env all -J CAPER_pipeline.test /work/bio-huangjx/n6vro_by.sh
Job <4995762> is submitted to queue <ser>.
(encd-atac) bjobs
No unfinished job found
(encd-atac) bjobs -l 4995762
Job <4995762>, Job Name <CAPER_pipeline.test>, User <bio-huangjx>, Project <def
ault>, Status <EXIT>, Queue <ser>, Command </work/bio-huan
gjx/n6vro_by.sh>, Share group charged </bio-huangjx>
Wed Jul 12 18:38:25: Submitted from host <login02>, CWD <$HOME/TempDir/2305atac
seq/11.encode.atac>, Re-runnable;
RUNLIMIT
2880.0 min of r01n14
MEMLIMIT
4 G
Wed Jul 12 18:38:27: Started 5 Task(s) on Host(s) <1*r01n14> <3*r01n15> <1*r01n
12>, Allocated 5 Slot(s) on Host(s) <1*r01n14> <3*r01n15>
<1*r01n12>, Execution Home </work/bio-huangjx>, Execution
CWD </work/bio-huangjx/TempDir/2305atacseq/11.encode.atac>
;
Wed Jul 12 18:38:29: Exited with exit code 127. The CPU time used is 0.1 second
s.
Wed Jul 12 18:38:29: Completed <exit>.
MEMORY USAGE:
MAX MEM: 1 Mbytes; AVG MEM: 1 Mbytes
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[-slots]
Effective: select[type == local] order[-slots]
Here is the config file for caper:
backend=lsf
# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=
# This parameter defines resource parameters for Caper's leader job only.
lsf-leader-job-resource-param=-W 2880 -M 4G -q ser
# This parameter defines resource parameters for submitting WDL task to job engine.
# It is for HPC backends only (slurm, sge, pbs and lsf).
# It is not recommended to change it unless your cluster has custom resource settings.
# See https://github.com/ENCODE-DCC/caper/blob/master/docs/resource_param.md for details.
lsf-resource-param=${"-n " + cpu} ${if defined(gpu) then "-gpu " + gpu else ""} ${if defined(memory_mb) then "-M " else ""}${memory_mb}${if defined(memory_mb) then "m" else ""} ${"-W " + 60*time}
cromwell=/work/bio-huangjx/.caper/cromwell_jar/cromwell-82.jar
womtool=/work/bio-huangjx/.caper/womtool_jar/womtool-82.jar
And here is the information about the software and environment:
(encd-atac) lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.5.1804 (Core)
Release: 7.5.1804
Codename: Core
(encd-atac) caper -v
2.2.2
(encd-atac) cat test.F.json
{
"atac.title" : "XenTroTissues",
"atac.description" : "15XenTroTissue",
"atac.pipeline_type" : "atac",
"atac.align_only" : false,
"atac.true_rep_only" : false,
"atac.genome_tsv" : "/work/bio-huangjx/data/refgenome/ENCO.atac/xetro10_NCBI.tsv",
"atac.paired_end" : true,
"atac.F_m1_R1" : [ "/work/bio-huangjx/TempDir/2305atacseq/00.rawdata/atac-m1-F/atac-m1-F_R1.fq.gz" ],
"atac.F_m1_R2" : [ "/work/bio-huangjx/TempDir/2305atacseq/00.rawdata/atac-m1-F/atac-m1-F_R2.fq.gz" ],
"atac.F_m2_R1" : [ "/work/bio-huangjx/TempDir/2305atacseq/00.rawdata/atac-m2-F/atac-m2-F_R1.fq.gz" ],
"atac.F_m2_R2" : [ "/work/bio-huangjx/TempDir/2305atacseq/00.rawdata/atac-m2-F/atac-m2-F_R2.fq.gz" ],
"atac.auto_detect_adapter" : true,
"atac.multimapping" : 4
"atac.smooth_win" : 140,
}
Thanks for responding!
Also encountering the same issue. Would appreciate any help regarding this. Thanks.
Can you edit the conf like the following (adding -o and -e to redirect error logs to local files) and try again?
lsf-leader-job-resource-param=-W 2880 -M 4G -q ser -o /YOUR/HOME/stdout.txt -e /YOUR/HOME/stderr.txt
Define /YOUR/HOME
as a directory that you have access to.
And please post those two log files here.
Hi! I am also dealing with this exact issue on an LSF cluster. I edited the conf file to include the line:
lsf-leader-job-resource-param=-W 2880 -M 4G -q ser -o /YOUR/HOME/stdout.txt -e /YOUR/HOME/stderr.txt
as suggested by leepc12. The output and error files are as follows:
stderr.txt
/home/lewks/.lsbatch/1708824945.82031239: line 8: /home/lewks/6pcgf87d.sh: No such file or directory
stdout.txt
Sender: LSF System lsfadmin@node184.hpc.local
[Subject: Job 82031239: <CAPER_ANY_GOOD_LEADER_JOB_NAME> in cluster Exited]
Job <CAPER_ANY_GOOD_LEADER_JOB_NAME> was submitted from host <node156.hpc.local> by user in > cluster at Sat Feb 24 20:35:45 2024
Job was executed on host(s) <node184.hpc.local>, in queue , as user in cluster at Sat Feb 24 20:35:45 2024
</home/lewks> was used as the home directory.
</home/lewks/atac-seq-pipeline> was used as the working directory.
Started at Sat Feb 24 20:35:45 2024
Terminated at Sat Feb 24 20:35:45 2024
Results reported at Sat Feb 24 20:35:45 2024
Your job looked like:
LSBATCH: User input
/home/lewks/6pcgf87d.sh
Exited with exit code 127.
Resource usage summary:
CPU time : 0.02 sec.
Max Memory : -
Average Memory : -
Total Requested Memory : -
Delta Memory : -
Max Swap : -
Max Processes : -
Max Threads : -
Run time : 0 sec.
Turnaround time : 0 sec.
The output (if any) follows:
PS:
Read file </home/lewks/stderr.txt> for stderr output of this job.
Thank you so much for any insight you might have as to how to fix this!
Best,
Stephanie
I'm truly sorry to hear about the difficulties you're experiencing with running CAPER. Unfortunately, due to our current bandwidth and personnel limitations, we are unable to provide immediate attention to resolving this particular issue.
We sincerely apologize for any inconvenience this may cause and greatly appreciate your understanding.
No problem! Thanks for letting us know.
Bump - I am experiencing this same issue. Error logs show calls to supposedly generated shell script that system checks for and finds does not exist...