Test for chromap failing
JoseEspinosa opened this issue · 6 comments
The test for chromap are failing with the exception below:
Caused by:
Process `NFCORE_CHIPSEQ:CHIPSEQ:MACS2_CALLPEAK (SPT5_T0_REP1)` terminated with an error exit status (1)
Command executed:
macs2 \
callpeak \
--keep-dup all --broad --broad-cutoff 0.1 \
--gsize 11624332 \
--format BAMPE \
--name SPT5_T0_REP1 \
--treatment SPT5_T0_REP1.mLb.clN.sorted.bam \
--control SPT5_INPUT_REP1.mLb.clN.sorted.bam
cat <<-END_VERSIONS > versions.yml
"NFCORE_CHIPSEQ:CHIPSEQ:MACS2_CALLPEAK":
macs2: $(macs2 --version | sed -e "s/macs2 //g")
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
4ca545ee6d5d: Already exists
ce477abda4ec: Pulling fs layer
ce477abda4ec: Verifying Checksum
ce477abda4ec: Download complete
ce477abda4ec: Pull complete
Digest: sha256:b8506329f67a88c4c9c0e433028c25ba5c29b199a8936f629a02c40a4ecf1942
Status: Downloaded newer image for quay.io/biocontainers/macs2:2.2.7.1--py38h4a8c8d9_3
INFO @ Wed, 27 Jul 2022 13:20:04:
# Command line: callpeak --keep-dup all --broad --broad-cutoff 0.1 --gsize 11624332 --format BAMPE --name SPT5_T0_REP1 --treatment SPT5_T0_REP1.mLb.clN.sorted.bam --control SPT5_INPUT_REP1.mLb.clN.sorted.bam
# ARGUMENTS LIST:
# name = SPT5_T0_REP1
# format = BAMPE
# ChIP-seq file = ['SPT5_T0_REP1.mLb.clN.sorted.bam']
# control file = ['SPT5_INPUT_REP1.mLb.clN.sorted.bam']
# effective genome size = 1.16e+07
# band width = 300
# model fold = [5, 50]
# qvalue cutoff for narrow/strong regions = 5.00e-02
# qvalue cutoff for broad/weak regions = 1.00e-01
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on
# Paired-End mode is on
INFO @ Wed, 27 Jul 2022 13:20:04: #1 read fragment files...
INFO @ Wed, 27 Jul 2022 13:20:04: #1 read treatment fragments...
INFO @ Wed, 27 Jul 2022 13:20:04: 42941 fragments have been read.
INFO @ Wed, 27 Jul 2022 13:20:04: #1.2 read input fragments...
INFO @ Wed, 27 Jul 2022 13:20:05: 80938 fragments have been read.
INFO @ Wed, 27 Jul 2022 13:20:05: #1 mean fragment size is determined as 0.0 bp from treatment
INFO @ Wed, 27 Jul 2022 13:20:05: #1 note: mean fragment size in control is 0.0 bp -- value ignored
INFO @ Wed, 27 Jul 2022 13:20:05: #1 fragment size = 0.0
INFO @ Wed, 27 Jul 2022 13:20:05: #1 total fragments in treatment: 42941
INFO @ Wed, 27 Jul 2022 13:20:05: #1 total fragments in control: 80938
INFO @ Wed, 27 Jul 2022 13:20:05: #1 finished!
INFO @ Wed, 27 Jul 2022 13:20:05: #2 Build Peak Model...
INFO @ Wed, 27 Jul 2022 13:20:05: #2 Skipped...
INFO @ Wed, 27 Jul 2022 13:20:05: #3 Call peaks...
Traceback (most recent call last):
File "/usr/local/bin/macs2", line 653, in <module>
main()
File "/usr/local/bin/macs2", line 51, in main
run( args )
File "/usr/local/lib/python3.8/site-packages/MACS2/callpeak_cmd.py", line 256, in run
peakdetect.call_peaks()
File "MACS2/PeakDetect.pyx", line 107, in MACS2.PeakDetect.PeakDetect.call_peaks
File "MACS2/PeakDetect.pyx", line 155, in MACS2.PeakDetect.PeakDetect.__call_peaks_w_control
ZeroDivisionError: float division
Work dir:
/home/runner/work/chipseq/chipseq/work/e2/86a656a96d88be7c10e969517fe145
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
To me, it seems like the fragment size can not be correctly obtained from the bam files. However, the test with the small dataset work with the rest of the aligners (BWA
, STAR
and Bowtie2
). The problem seems to be related to the pairing of the reads performed by chromap
since I have run the full test with chromap
(there we have single-end reads) without encountering this problem.
The command that the pipeline uses for aligning reads with chromap
is:
chromap -l 2000 --SAM -t 2 -x genome.index -r genome.fa -1 SPT5_INPUT_REP2_T1_1_val_1.fq.gz -2 SPT5_INPUT_REP2_T1_2_val_2.fq.gz -o SPT5_INPUT_REP2_T1.Lb.sam
After discussion in Slack with @lextallan seems here seems that the problem is the TLEN column in the files produced by chromap
is empty: "Main issue for running chromap aligned reads through MACS2 appeared to be the TLEN column in the sam/bam files is blank (or all 0s specifically). Could be wrong here, though. I asked the author of chromap about it (here: haowenz/chromap#99). Long story short, chromap
isn't designed for outputting bams but they're working on it for a future release".
To fix the issue momentarily, after discussion with @drpatelh, we will filter the channels after running Chromap only to include single-end data throwing a warning for paired-end data. Once the bug becomes fixed this filter will be removed.
This has been partially fixed in #290 by filtering out paired-end data when Chromap is used. However, once a new version of Chromap is released that output usable bam files, this issue might need to be revisited.
Some update: TLEN is recently added into Chromap's SAM output in the latest release.
Thanks for the update @swiftgenomics will give it a try!
Closes in #338