sherrillmix/taxonomizr

Get all bacterial species

Closed this issue · 2 comments

Hi,
would it be possible to extract a table of all species of the kingom bacteria ?
getTaxonomy("2") returns only one entry for bacteria. I am looking for a table of approx. 524229 entries.

Best, Michael

Hmm I haven't really implemented functions to mess around with the names and nodes here. I guess somebody must have implemented something elsewhere but I should add in if it doesn't exist. I'll look around.

But in the meantime you should be able to pull that with a sqlite query. I think it'd be something like (perhaps not the most efficient but should be good enough to get done in a few minutes):

db<-RSQLite::dbConnect(RSQLite::SQLite(), dbname = 'accessionTaxa.sql')
allSpecies<-RSQLite::dbGetQuery(db,"SELECT * FROM nodes WHERE rank='species'")
RSQLite::dbDisconnect(db)
#this one will probably take a couple minutes
allSpeciesTaxa<-taxonomizr::getTaxonomy(allSpecies$id,'accessionTaxa.sql')
bactSpecies<-allSpeciesTaxa[allSpeciesTaxa[,'superkingdom']=='Bacteria','species']

The output looks bacteriaish at first glance:

> head(bactSpecies)
                             7                              9 
    "Azorhizobium caulinodans"          "Buchnera aphidicola" 
                            11                             14 
         "Cellulomonas gilvus"    "Dictyoglomus thermophilum" 
                            17                             19 
"Methylophilus methylotrophus"   "Syntrophotalea carbinolica"

But looks like the numbers might be slightly lower than you were anticipating:

> table(allSpeciesTaxa[,'superkingdom'])
  Archaea  Bacteria Eukaryota   Viruses 
    12844    477748   1456925     53334

Perhaps species is defined more loosely than node rank='species' in the 524229 entries?

I added a findDescendants function for this in v0.10.1. So it'd be:

bact<-findDescendants(2,'accessionTaxa.sql')