leylabmpi/Struo2

GTDB to NCBI?

Opened this issue · 3 comments

I'm using the GTDB with kraken2 and get a .txt & .report file out, but I don't understand how to convert the GTDB ID of the .txt file to NCBI ID? There are several tools to convert between GTDB & NCBI, but none of them seem to work with just GTDB ID:s?

I created a mapping tool for that purpose: https://github.com/nick-youngblut/gtdb_to_taxdump (see ncbi-gtdb_map.py). There are others (e.g., https://gtdb.ecogenomic.org/tools).
At the most basic level, the GTDB metadata maps the GTDB taxonomy to the NCBI taxonomy for each reference genome in the GTDB database.

Sorry, but I don't understand at all how neither gtdb_to_taxdump or the other tools can handle the a list of GTDB ID:s from a kraken2 .txt file? I wouldn't really need to convert to NCBI taxonomy if I could use a GTDB taxonomy database for Krona to display the kraken2 results?

ncbi-gtdb_map.py maps GTDB taxonomy to NCBI taxonomy, if you want NCBI taxonomic names from an existing set of GTDB taxonomic names.