biod/sambamba

sambamba -F "not duplicate" processed bam still have duplicated marked by sambamba markdup

piyushjo15 opened this issue · 2 comments

Hi,
I was following a ChIP-seq tutorial where they have mentioned to use sambamba to remove multimapped, unmapped and duplicate reads from bam file using below code
sambamba view -h -f bam -F "[XS] == null and not unmapped and not duplicate" in.bam > out.bam

The out.bam I checked for duplicates using sambamba markdup
sambamba markdup out.bam out2.bam
Surprisingly I see that out2.bam bas marked duplicates, but weren't those reads filtered out in the first step?
Am I misunderstanding something?
Thanks,
Piyush

Hi,

I had posted this issue over here earlier but then I thought since it is a support issue rather than bug, I posted it on google groups.
I also found that F flag to remove duplicate via "not duplicate" works after I have marked duplicates. Initially I thought the flag will mark and then remove duplicate but that's not the case.

Thanks
Piyush

Hi,

I had posted this issue over here earlier but then I thought since it is a support issue rather than bug, I posted it on google groups. I also found that F flag to remove duplicate via "not duplicate" works after I have marked duplicates. Initially I thought the flag will mark and then remove duplicate but that's not the case.

Thanks Piyush

Hi, before I also used sambamba view -h -f bam -F "[XS] == null and not unmapped and not duplicate" in.bam > out.bam for ChIP-seq filtering, so now how do you filter bam file in ChIP-seq?