macs3-project/MACS

Number of de-duplicated reads indicated by filterdup log does not match number of lines in output BED file.

callum-b opened this issue · 0 comments

Unsure if this should be listed as bug or question, but it feels like something isn't working properly to me so I listed it as bug.

From the docs:
"The filterdup command takes an input alignment file and produces an output file in BED format with duplicate reads removed according to the setting."

I ran macs3 filterdup -f BAM --keep-dup=1 -i my/file.bam -o my/file_filterdup.bed

The logs indicate that there are 41436468 , but the output BED file is 41436476 lines long (using wc -l). As I understand, these two values should match.

BAM file used (expires 5th of July 2024): https://filesender.renater.fr/?s=download&token=4490ed9b-04d3-4e4a-aafa-60afca608c9c
Its index was generated with default params by samtools index.

Where are these 8 extra lines coming from?

  • OS: Ubuntu 20.04.6 LTS
  • Python version 3.10.14
  • Numpy version 1.26.4
  • MACS Version 3.0.1

PS: I just ran the command on two other BAM files, getting 6 and 8 lines difference respectively.