ncss-tech/SoilTaxonomy

parseFamily

dylanbeaudette opened this issue · 7 comments

EDIT by AGB:

  • Goal: this issue is to cover discussions related to extending #22, #23, ... to make a nice interface to family-level taxa (which will presumably be a structure useful for series-level)

Dylan's concept drawing from #8:
image

A minor addition to current functionality: family differentiae split into both:

  • pieces used in the current taxon, as a named list (texture class, min. class, reaction, etc.)
  • pieces split into data.frame containing all possible (e.g. including not-used) components

This would make it possible access elements by name that are used for any given taxa (ragged, named list), or smash multiple taxa into a single data.frame that is padded with NA in unused cases.

I think I get what you mean. I'll implement a couple things that come to mind that will need to be there to support this.
Notably the family class info parsed from the keys should get linked to the appropriate NASIS domains

I started sketching this out in parseFamily.R--but commented out/temporarily removed the bits that add new dependencies /rely on soilDB 2.7.3+. Was going to submit a release today but I am gonna hold off until early next week probably after doing some more noodling.

I've made several updates and merged the changes that lower soilDB dependency into master branch.

Caught some bugs, found edge cases and cleaned up the output tables using this example code as a starting point:

library(soilDB)
library(SoilTaxonomy)
sc <- get_soilseries_from_NASIS()
scx <- subset(sc, sc$soilseriesstatus == "established")
system.time({x <- parse_family(scx$taxclname)})

Several new tests have been added for complicated family-level taxonomies.

I am still thinking on dealing with the "child" type family classes where it is potentially valid to have more than one comma-separated class (e.g. taxminalogy and taxfamother). Examples such as: "shallow, ortstein".

Currently this is handled , but puts non-standard values into standard column names. I may want to have a list column with individual elements having the official name and appending "_concat" or similar for the "flat" column data.

Also still unhandled/TBD are the mineralogies associated with "strongly contrasting" family classes e.g. "amorphic over isotic". However, currently our choice lists do have the strongly contrasting PSCs in them (for the combinations that are defined on p322 in keys)

An update: strongly contrasting particle size classes are in the domain choice lists, but the associated possible combinations of e.g. mineralogy class are not. Which makes sense: the latter is a combinatorial explosion, while the former is constrained to a (fairly large) list of specific conditions.

Regarding concatenation of family "other" class, mineralogy, etc. I am going to add a new argument flat=TRUE that by default will use the standard NASIS physical column names names with concatenated choice list items. When flat=FALSE rather than the concatenated result will return a list column. This will include parsing combination classes concatenated with " over ", such that the results map 1:1 with choice lists. The implied vertical order is ascending order... i.e. the first element is over the second

Nice. I like this approach. I'd like to build the SoilWeb seriesTree application from these new tools, vs. the current approach.

All items in this issue have been resolved.

As part of #38 parse_family() may be refactored and functionality split to support higher taxonomic classes as input, returning an analogous data.frame output.