yfukasawa/LongQC

Question about bases masked

Opened this issue · 3 comments

Hi, Yoshinori. First thanks for developing this nice tool.
You explained that the second column of 'longqc_sdust.txt' table is 'the number of bases masked (MDUST)'. I am wondering how you define a masked base. Since for my read, I didn't find a base pair in lower case.

Hi @justdx,

Thank you for your interest in our tool.

Regarding masking, LongQC computes low complexity region of your reads from scratch using symmetric DUST algorithm.
If given reads have low complexity region(s), it is detected by a program regardless of letter case. LongQC is actually a case insensitive tool.
I hope this answer clarifies your question.

Yoshinori

Hi Yoshinori,
Thanks for the reply.
I think I have understood it: for a given read, masked base counts the number of bases pairs in low complexity region(s) identified by DUST algorithm.
Thanks again for the help.

yes, exactly.
The tool cannot assume that given reads are already masked by some tools before LongQC (may or may not be masked), hence this configuration.
I hope our tool will help your projects.