ncss-tech/SoilTaxonomy

Refine procedures for building internal datasets

brownag opened this issue · 2 comments

As I have tacked more things on to the "rebuild R package datasets" script it has become more complicated.

There are currently several steps involved in rebuilding full set of datasets. For instance extracting/preparing dictionaries for formative elements from NASIS domains is a separate script and uses CSVs as an intermediate. A couple of my datasets are pulled/derived from SoilKnowledgeBase. Some of the logic currently stored in the dataset building script might be better offloaded to the KST parser. Curious if parsing could be improved using information pulled from NASIS domains...

Also I think I want to standardize on having a raw (flat file) data sets for all internal datasets, and have use something similar to the usethis::use_data_raw setup

#23 shows some effects/gory detail with the current implementation of parsing class names from NASIS domains and definitions from SKB

The data-raw scripts have been set up. There is not an immediate issue here, and some of the parsing will need to be updated when KST 13th edition is eventually released, but that can be dealt with when those changes are implemented and published.