Classification as data frame
tomjwebb opened this issue · 6 comments
Hi @sckott - I've been using this package and found I wanted the output of wm_classification()
to run over a list of species and end up as a data frame - the output of this is somewhat different from what you get from wm_record()
, in particular the classification function returns 'non-standard' taxonomic groups (superfamily etc.) which I happen to need for this application. Anyway, I've written a couple of functions, which I thought I'd upload here in case you felt there is any more general use for them? The first just turns the output from wm_classification()
into a data frame (and attempts to do something sensible with errors), the second runs this over a list of species and then binds all the resulting data frames together into a single tbl_df
of classifications for the whole species list:
get_sp_classif <- function(sp){
# try to get WoRMS aphia ID from name
aphia <- try(wm_name2id(sp), silent = TRUE)
# check if this worked (catches unrecognised names and instances where AphiaID of -999 is returned)
if(identical(class(aphia), "try-error") | aphia < 0){
# try using genus
aphia <- try(wm_name2id(stringr::word(sp, 1)), silent = TRUE)
if(identical(class(aphia), "try-error") | aphia < 0){
# return NULL if no aphia ID was found
classif_df <- data.frame(sciname = sp)
aphia <- NA
}
}
if(!is.na(aphia)){
# if aphia ID was found, get full classification
classif <- wm_classification(aphia)
# convert into data frame
classif_df <- read.csv(text = "",
col.names = c("sciname", "AphiaID", classif$rank),
colClasses = c("character", "numeric", rep("character", length(classif$rank))), stringsAsFactors = FALSE)
if("Species" %in% classif$rank){
classif_df[1,] <- cbind(sp, classif$AphiaID[classif$rank == "Species"], t(classif$scientificname))
} else {
classif_df[1,] <- cbind(sp, NA, t(classif$scientificname))
}
}
classif_df
}
sp_list_classif <- function(sp_list){
# run the classification function over the whole list
classifs <- sapply(sp_list, function(sp_list){get_sp_classif(sp = sp_list)})
# return as dataframe
classifs <- dplyr::bind_rows(classifs)
dplyr::tbl_df(classifs)
}
thanks for the issue @tomjwebb 😸
edited your code above just a bit so it can run without errors (namespace calls to dplyr fxns)
Might make sense to add a function like this where you can input >1 taxonomic names and get classifications in a data.frame
Note that this is in taxize
library(taxize)
xx <- get_wormsid(c('Platanista gangetica', 'Leucophaeus scoresbii'))
dplyr::tbl_df(cbind(classification(xx)))
# A tibble: 2 x 29
kingdom phylum subphylum superclass superclass.1 class subclass order suborder infraorder
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Animalia Chordata Vertebrata Gnathostomata Tetrapoda Mammalia Theria Cetartiodactyla Cetancodonta Cetacea
2 Animalia Chordata Vertebrata Gnathostomata Tetrapoda Aves <NA> Charadriiformes <NA> <NA>
# ... with 19 more variables: superfamily <chr>, family <chr>, genus <chr>, species <chr>, kingdom_id <chr>,
# phylum_id <chr>, subphylum_id <chr>, superclass_id <chr>, superclass_id.1 <chr>, class_id <chr>, subclass_id <chr>,
# order_id <chr>, suborder_id <chr>, infraorder_id <chr>, superfamily_id <chr>, family_id <chr>, genus_id <chr>,
# species_id <chr>, query <chr>
even though that's in taxize
, maybe there is still reason to include similar functionality here
thoughts?
Ah thanks @sckott - I should have checked taxize
first! I should probably switch to using that, have just found worrms
convenient. I also added a bit more to the first function (edited above), which now tries to get the classification of a genus if it can't find an AphiaID for the species (this is useful for what I'm doing at the moment).
Anyway if you envisage others using worrms
standalone then I think this functionality is useful - both returning the classification as a data frame, and being able to run it over a list of species. But if you want to point people to taxize
instead that works too. Feel free to close this issue anyway!
thinking about this
@tomjwebb okay, reinstall like devtools::install_github("ropensci/worrms@changes")
and look at docs for wm_children
and wm_classification
- new fxns for those two (just to demo the concepts, then pkg wide later maybe) - and egs added
thoughts?
i don't want to break current functionality of fxns in pkg, so this makes it so that new functionality will be easy to find as they are on the same man pages as their sister fxn
done, we can reopen or open new issue to discuss anything further related to this