`parse_family()` with taxa above family
brownag opened this issue · 1 comments
Consider the following example using parse_family()
on a taxonomic class field of mixed levels.
Note where taxclname
is:
- a suborder-level name
- a subgroup-level name
- great group-level name with some family level classes specified
library(SoilTaxonomy)
suppressPackageStartupMessages(library(soilDB))
x <- data.frame(
taxonname = c("Alberti", "Aquents", "Lithic Xeric Torriorthents", "Stagy Family", "Haplodurids"),
taxonkind = c("series", "taxon above family", "taxon above family", "family", "taxon above family"),
taxclname = c(
"Clayey, smectitic, thermic, shallow Vertic Rhodoxeralfs",
"Aquents",
"Lithic Xeric Torriorthents",
"Coarse-loamy, mixed, mesic Duric Haploxerolls",
"Mixed, superactive, thermic Haplodurids"
))
parse_family(x$taxclname)
#> family
#> 1 Clayey, smectitic, thermic, shallow Vertic Rhodoxeralfs
#> 2 Aquents
#> 3 Lithic Xeric Torriorthents
#> 4 Coarse-loamy, mixed, mesic Duric Haploxerolls
#> 5 Mixed, superactive, thermic Haplodurids
#> subgroup subgroup_code class_string
#> 1 vertic rhodoxeralfs JDEB Clayey, smectitic, thermic, shallow
#> 2 <NA> <NA> <NA>
#> 3 lithic xeric torriorthents LECB
#> 4 duric haploxerolls IFFZ Coarse-loamy, mixed, mesic
#> 5 <NA> <NA> <NA>
#> classes_split taxpartsize taxpartsizemod taxminalogy taxceactcl taxreaction
#> 1 Clayey, .... clayey NA smectitic NA NA
#> 2 NA <NA> NA <NA> NA NA
#> 3 <NA> NA <NA> NA NA
#> 4 Coarse-l.... coarse-loamy NA mixed NA NA
#> 5 NA <NA> NA <NA> NA NA
#> taxtempcl taxfamhahatmatcl taxfamother taxsubgrp
#> 1 thermic NA shallow Vertic Rhodoxeralfs
#> 2 <NA> NA <NA> <NA>
#> 3 <NA> NA <NA> Lithic Xeric Torriorthents
#> 4 mesic NA <NA> Duric Haploxerolls
#> 5 <NA> NA <NA> <NA>
#> taxgrtgroup taxsuborder taxorder
#> 1 Rhodoxeralfs Xeralfs Alfisols
#> 2 <NA> <NA> <NA>
#> 3 Torriorthents Orthents Entisols
#> 4 Haploxerolls Xerolls Mollisols
#> 5 <NA> <NA> <NA>
Should this be handled differently? Currently the derived NASIS-like columns e.g. taxsuborder
are from decomposing a valid (current taxonomy) subgroup level name, so they return NA
for taxon above family that aren't subgroup-level.
Questions:
-
Is it "valid" to apply family-level classes to taxa above subgroup?
-
If there is a detectable taxon above subgroup should it be split out?
- Should family level classes also be returned (even if not "valid")?
-
How often are family level taxa combined with taxa above subgroup in SSURGO?
In practice SSURGO components that are taxa above subgroup usually are constrained to one or more family classes e.g. PSC, temperature regime, which can sometimes be cleanly expressed using something like the family level class format. I suppose these can be interpreted as specifications about groups of related families... but it may be that it is confusing without some sort of a wildcard character, and splits of taxa above subgroup should be based on phases (outside scope of package).
This has been addressed in #46
Should this be handled differently? Currently the derived NASIS-like columns e.g.
taxsuborder
are from decomposing a valid (current taxonomy) subgroup level name, so they returnNA
for taxon above family that aren't subgroup-level.
Now taxa at any level are returned. Two additional columns are added "taxclname" and "code"--these refer to the input taxonomic class and lowest-level letter code (order, suborder, great group or subgroup).
- Is it "valid" to apply family-level classes to taxa above subgroup?
Yes, it is common for higher taxonomic concepts to have specific family level classes associated with them. For instance the temperature regime or particle size class.
- If there is a detectable taxon above subgroup should it be split out?
Yes, and to avoid confusion the subgroup_code and lowest-level (not necessarily subgroup) code are both returned. For taxa above subgroup the value is NA
for any levels that are not defined in the input.
- Should family level classes also be returned (even if not "valid")?
We don't currently have the logic to determine which family level classes are required for particular taxa. The ability to validate whether classes used are appropriate for particular subgroup or higher level taxa could be within the purview of a new function validate_family()
or similar.
- How often are family level
taxaclasses combined with taxa above subgroup in SSURGO?
Some quick queries indicate that more often than not a taxon above family is associated with one or more family-level classes. 70% of taxon above family components have taxpartsize
and/or taxtempregime
.
suppressPackageStartupMessages(library(soilDB))
SDA_query("SELECT COUNT(DISTINCT cokey) FROM component
WHERE compkind = 'taxon above family'")
#> single result set, returning a data.frame
#> V1
#> 1 37312
SDA_query("SELECT COUNT(DISTINCT cokey) FROM component
WHERE compkind = 'taxon above family'
AND taxpartsize IS NOT NULL")
#> single result set, returning a data.frame
#> V1
#> 1 20240
SDA_query("SELECT COUNT(DISTINCT cokey) FROM component
WHERE compkind = 'taxon above family'
AND taxtempregime IS NOT NULL")
#> single result set, returning a data.frame
#> V1
#> 1 23840
SDA_query("SELECT COUNT(DISTINCT cokey) FROM component
WHERE compkind = 'taxon above family'
AND (taxpartsize IS NOT NULL OR taxtempregime IS NOT NULL)")
#> single result set, returning a data.frame
#> V1
#> 1 26621