bbuchfink/diamond

Inconsistent taxonomy assignment results for the same sequences

XHe20 opened this issue · 1 comments

I used Diamond and MEGAN to assign taxonomy for my contigs.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100

daa-meganizer -i final.contigs_graham_01.daa \
-mdb megan-map-Feb2022.db --longReads

I exported taxonomy information at the Class level using MEGAN, and there were 1306 contigs assigned to Mammalia. I used those 1306 sequences to re-run above scripts and only 97.6% of the 1306 sequences were assigned to Mammalia. This is not expected as I expected 100% of the 1306 sequences were assigned to Mammalia.

Then, I set --masking 0 and run the analyses again.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01_2.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100 --masking 0

daa-meganizer -i final.contigs_graham_01_2.daa \
-mdb megan-map-Feb2022.db --longReads

I used the contigs assigned to Mammalia to re-run the above scripts, only 82.2% sequences were assigned to Mammalia.

I am wondering what caused the inconsistency and what parameters can be used to increase the consistency for the results from different runs.

I'm not really sure what's happening here you would have to look at all the alignments in detail.