Duplicated reads level is high

Question

Duplicated reads level is high

Jarvis559 opened this issue 8 months ago · 1 comments

Hi, @y9c !
I have a question here. I use the BID-pipe for my data, and the duplication level is 40%. I also use 'seqkit rmdup' by the sequence to calculate the duplication level, it's only 20%. I want to know how the BID-pipe calculate the duplication level and what's the difference from the seqkit rmdup. Looking forward to your reply. Thanks a lot!

Answer 1 · 2024-03-07T23:33:25.000Z

Hi @Jarvis559. It is known that deduplicate before mapping would overestimate the library complexity and this is why seqkit rmdup tend to report lower duplication level. However, this difference is too large in your case. Could you show more detail or upload some example data for the debug?