macs3-project/MACS

Q: Peakcalling in paired-end bam as single-end and keeping all reads

MbDiop opened this issue · 4 comments

I am using macs2 version 2.2.6.

Can you confirm my observations please :

  • When --format BAMPE is used with paired-end BAM files, only properly paired are kept

  • When --format BAM is used with paired-end BAM files, only properly paired are kept and then only R1 tags are considered

If this is the case, my question is how can I analyze my paired-end bam files as single-end without filtering any reads ?

Thank you in advance for your help.

@MbDiop

  1. Yes. Only properly paired reads are kept;

  2. Yes. Only the R1 will be kept.

If you plan to consider both ends as 'single ends', you have to hack the BAM file you plan to use. Basically, MACS will check the first bit of FLAG(https://samtools.github.io/hts-specs/SAMv1.pdf), and if it's 1, then MACS will consider it as a paired-end BAM file and skip the R2. Therefore, if you can set all the first bit of FLAG in the BAM file as 0, MACS will regard this file as a single-end BAM file. My idea is that you can try to modify the plain text counterpart of the BAM file -- SAM file, by subtract 1 from the second column, then convert it back to BAM. I didn't try that solution before, so, if it works (or doesn't work), please let me know.

@taoliu

Thank you for your answer and your idea. But my problem is not that R2 reads are skipped when --format BAM is used with paired-end BAM files, it's more than that. All R2 are skipped and also R1 reads that are not properly paired are skipped. Is it possible maybe in the future to allow users to choose to disable this filter on properly paired reads in Macs2 ?

@MbDiop Yes. You are right that even for -f BAM we filter out 'unpaired' reads.

if (bwflag & 2820) or (bwflag & 1 and (bwflag & 136 or not bwflag & 2)): return ( -1, -1, -1 )

and

if (bwflag & 2820) or (bwflag & 1 and (bwflag & 136 or not bwflag & 2)): return ( -1, -1, -1 )

When FLAG contains 1 (1 = it has pairing information), MACS will discard those reads without FLAG 2 (2 = properly paired) or with FLAG 136 (136=8+128, 8 = mate unmapped, 128 = second read in pair). So that's why I suggested you previously to hack the first bit of FLAG to 0 so that MACS won't check 2, 8 or 128 of the FLAG. And in this way, you can allow all ends of a paired-end BAM file counted as of a single-end file.

Sure. I will add an option in the next version to allow more customized filtering on FLAG or... just leave the filtering to other tools such as picard or samtools.

Hi Tao!

Thanks for the great tool and services!

I have some questions about calling peaks using -f BAM (SE) vs -f BAMPE (PE) for paired-end reads. Could you please help check if I get them right? Thanks.

(1) When calculating coverage using -f BAMPE for PE reads, each pair of reads will be counted as one read, instead of 2, no matter how many overlaps R1 and R2 have?

(2) If -f BAM (SE) was used for PE reads (which was incorrect but happened sometimes), only R1 would be kept as you mentioned above. R1 refers to reads from the R1.fastq from the sequencing center, and has nothing to do with forward or reverse reads, right?

I called peaks from PE reads using -f BAM by mistake. Then I corrected the calling using -f BAMPE. When I compared these two sets of peaks, the amount of peaks for some ChIP samples doubled, while other decreased, compared to the incorrect SE calling. Any suggestions on why samples showed opposite trends?

Thanks,
Mao