DerrickWood/kraken2

can get mapped region in output result?

jyl-hb opened this issue · 5 comments

Hello all,

In software docs, the 5 columns of output result file is a space-delimited list indicating the LCA mapping of each k-mer in the sequence(s). For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
the first 13 k-mers mapped to taxonomy ID #562
the next 4 k-mers mapped to taxonomy ID #561
the next 31 k-mers contained an ambiguous nucleotide
the next k-mer was not in the database
the last 3 k-mers mapped to taxonomy ID #562

Is this result can be explained in this way?
the input sequence length is 51,
1-13 bases mapped to 562
14-17 bases mapped to 561
18-48 bases unmapped to all sequenceindb
49-51 bases mapped to 562
finally, the input sequence will be classfied to 562, because the ratio of bases mapped to 562 is the largest

Thank you.

不 是前13个kmer映射到562(每个kmer在默认长度下为35bp 也就是前48bp比对到562) 而不是前13bp的序列后面以此类推

不 是前13个kmer映射到562(每个kmer在默认长度下为35bp 也就是前48bp比对到562) 而不是前13bp的序列后面以此类推

谢谢您的回复,这里没太理解。这里的48bp是13+35吗?我理解的是前13个kmer,每个kmer默认长度是35bp,应该是13*35bp比对到562?

你可以看一下这个链接 https://blog.csdn.net/u010608296/article/details/114134044 对kmer的概念解释的比较清楚
image

你可以看一下这个链接 https://blog.csdn.net/u010608296/article/details/114134044 对kmer的概念解释的比较清楚 image

好的,谢谢

Hi, I assume that @hedy-ella was able to answer your question?