Get all bacterial species
Closed this issue · 2 comments
Hi,
would it be possible to extract a table of all species of the kingom bacteria ?
getTaxonomy("2") returns only one entry for bacteria. I am looking for a table of approx. 524229 entries.
Best, Michael
Hmm I haven't really implemented functions to mess around with the names and nodes here. I guess somebody must have implemented something elsewhere but I should add in if it doesn't exist. I'll look around.
But in the meantime you should be able to pull that with a sqlite query. I think it'd be something like (perhaps not the most efficient but should be good enough to get done in a few minutes):
db<-RSQLite::dbConnect(RSQLite::SQLite(), dbname = 'accessionTaxa.sql')
allSpecies<-RSQLite::dbGetQuery(db,"SELECT * FROM nodes WHERE rank='species'")
RSQLite::dbDisconnect(db)
#this one will probably take a couple minutes
allSpeciesTaxa<-taxonomizr::getTaxonomy(allSpecies$id,'accessionTaxa.sql')
bactSpecies<-allSpeciesTaxa[allSpeciesTaxa[,'superkingdom']=='Bacteria','species']
The output looks bacteriaish at first glance:
> head(bactSpecies)
7 9
"Azorhizobium caulinodans" "Buchnera aphidicola"
11 14
"Cellulomonas gilvus" "Dictyoglomus thermophilum"
17 19
"Methylophilus methylotrophus" "Syntrophotalea carbinolica"
But looks like the numbers might be slightly lower than you were anticipating:
> table(allSpeciesTaxa[,'superkingdom'])
Archaea Bacteria Eukaryota Viruses
12844 477748 1456925 53334
Perhaps species is defined more loosely than node rank='species' in the 524229 entries?
I added a findDescendants
function for this in v0.10.1. So it'd be:
bact<-findDescendants(2,'accessionTaxa.sql')