more than 1000+ tandem repeat reads were assigened to a species, is it reliable?

Question

more than 1000+ tandem repeat reads were assigened to a species, is it reliable?

lingxuan85511 opened this issue 3 years ago · 1 comments

I used Kraken2+Bracken to quantify the composition of microbes in my metagenomic data with pre-build database (PlusPF). However, I found there are more than 1000+ reads were assigned to species A. After I got the PE reads related to species A by using bowtie2 mapping, I found all reads are tandem repeats. Since reads with low complexity is less informative, is this result reliable? How can I dismissed the impact of tandem repeats reads when using Kraken2+Bracken?

Answer 1 · 2022-03-14T15:37:25.000Z

What you could do is mask the reads prior to classification using dust. We do mask the database sequences themselves, but it does not prevent all of these.

You could also rerun kraken using --report-minimizer-data: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#distinct-minimizer-count-information