Duplicated reads level is high
Jarvis559 opened this issue · 1 comments
Jarvis559 commented
Hi, @y9c !
I have a question here. I use the BID-pipe for my data, and the duplication level is 40%. I also use 'seqkit rmdup' by the sequence to calculate the duplication level, it's only 20%. I want to know how the BID-pipe calculate the duplication level and what's the difference from the seqkit rmdup. Looking forward to your reply. Thanks a lot!
y9c commented
Hi @Jarvis559. It is known that deduplicate before mapping would overestimate the library complexity and this is why seqkit rmdup tend to report lower duplication level. However, this difference is too large in your case. Could you show more detail or upload some example data for the debug?