ropensci/worrms

Classification as data frame

tomjwebb opened this issue · 6 comments

Hi @sckott - I've been using this package and found I wanted the output of wm_classification() to run over a list of species and end up as a data frame - the output of this is somewhat different from what you get from wm_record(), in particular the classification function returns 'non-standard' taxonomic groups (superfamily etc.) which I happen to need for this application. Anyway, I've written a couple of functions, which I thought I'd upload here in case you felt there is any more general use for them? The first just turns the output from wm_classification() into a data frame (and attempts to do something sensible with errors), the second runs this over a list of species and then binds all the resulting data frames together into a single tbl_df of classifications for the whole species list:

get_sp_classif <- function(sp){
	
	# try to get WoRMS aphia ID from name
	aphia <- try(wm_name2id(sp), silent = TRUE)
	# check if this worked (catches unrecognised names and instances where AphiaID of -999 is returned)
	if(identical(class(aphia), "try-error") | aphia < 0){
		# try using genus
		aphia <- try(wm_name2id(stringr::word(sp, 1)), silent = TRUE)	
		if(identical(class(aphia), "try-error") | aphia < 0){	
			# return NULL if no aphia ID was found
			classif_df <- data.frame(sciname = sp)
			aphia <- NA
		}
	}
	if(!is.na(aphia)){
		# if aphia ID was found, get full classification
		classif <- wm_classification(aphia)
		# convert into data frame
		classif_df <- read.csv(text = "",
			col.names = c("sciname", "AphiaID", classif$rank),
			colClasses = c("character", "numeric", rep("character", length(classif$rank))), stringsAsFactors = FALSE)
		if("Species" %in% classif$rank){
			classif_df[1,] <- cbind(sp, classif$AphiaID[classif$rank == "Species"], t(classif$scientificname))
		} else {
			classif_df[1,] <- cbind(sp, NA, t(classif$scientificname))
		}
	}

	classif_df
		
	}
sp_list_classif <- function(sp_list){
	
	# run the classification function over the whole list
	classifs <- sapply(sp_list, function(sp_list){get_sp_classif(sp = sp_list)})
	
	# return as dataframe
	classifs <- dplyr::bind_rows(classifs)
	
	dplyr::tbl_df(classifs)		
}

thanks for the issue @tomjwebb 😸

edited your code above just a bit so it can run without errors (namespace calls to dplyr fxns)

Might make sense to add a function like this where you can input >1 taxonomic names and get classifications in a data.frame

Note that this is in taxize

library(taxize)
xx <- get_wormsid(c('Platanista gangetica', 'Leucophaeus scoresbii'))
dplyr::tbl_df(cbind(classification(xx)))
# A tibble: 2 x 29
   kingdom   phylum  subphylum    superclass superclass.1    class subclass           order     suborder infraorder
     <chr>    <chr>      <chr>         <chr>        <chr>    <chr>    <chr>           <chr>        <chr>      <chr>
1 Animalia Chordata Vertebrata Gnathostomata    Tetrapoda Mammalia   Theria Cetartiodactyla Cetancodonta    Cetacea
2 Animalia Chordata Vertebrata Gnathostomata    Tetrapoda     Aves     <NA> Charadriiformes         <NA>       <NA>
# ... with 19 more variables: superfamily <chr>, family <chr>, genus <chr>, species <chr>, kingdom_id <chr>,
#   phylum_id <chr>, subphylum_id <chr>, superclass_id <chr>, superclass_id.1 <chr>, class_id <chr>, subclass_id <chr>,
#   order_id <chr>, suborder_id <chr>, infraorder_id <chr>, superfamily_id <chr>, family_id <chr>, genus_id <chr>,
#   species_id <chr>, query <chr>

even though that's in taxize, maybe there is still reason to include similar functionality here

thoughts?

Ah thanks @sckott - I should have checked taxize first! I should probably switch to using that, have just found worrms convenient. I also added a bit more to the first function (edited above), which now tries to get the classification of a genus if it can't find an AphiaID for the species (this is useful for what I'm doing at the moment).

Anyway if you envisage others using worrms standalone then I think this functionality is useful - both returning the classification as a data frame, and being able to run it over a list of species. But if you want to point people to taxize instead that works too. Feel free to close this issue anyway!

thinking about this

@tomjwebb okay, reinstall like devtools::install_github("ropensci/worrms@changes")

and look at docs for wm_children and wm_classification - new fxns for those two (just to demo the concepts, then pkg wide later maybe) - and egs added

thoughts?

i don't want to break current functionality of fxns in pkg, so this makes it so that new functionality will be easy to find as they are on the same man pages as their sister fxn

@tomjwebb thoughts?

done, we can reopen or open new issue to discuss anything further related to this