Number of de-duplicated reads indicated by filterdup log does not match number of lines in output BED file.
callum-b opened this issue · 0 comments
Unsure if this should be listed as bug or question, but it feels like something isn't working properly to me so I listed it as bug.
From the docs:
"The filterdup command takes an input alignment file and produces an output file in BED format with duplicate reads removed according to the setting."
I ran macs3 filterdup -f BAM --keep-dup=1 -i my/file.bam -o my/file_filterdup.bed
The logs indicate that there are 41436468 , but the output BED file is 41436476 lines long (using wc -l). As I understand, these two values should match.
BAM file used (expires 5th of July 2024): https://filesender.renater.fr/?s=download&token=4490ed9b-04d3-4e4a-aafa-60afca608c9c
Its index was generated with default params by samtools index.
Where are these 8 extra lines coming from?
- OS: Ubuntu 20.04.6 LTS
- Python version 3.10.14
- Numpy version 1.26.4
- MACS Version 3.0.1
PS: I just ran the command on two other BAM files, getting 6 and 8 lines difference respectively.