ncss-tech/SoilTaxonomy

data.tree wrapper functions

brownag opened this issue · 7 comments

  • Add {data.tree} to suggests
  • Develop a few standard path strings that can be used with standard datasets (or subsets thereof)
  • Functions for writing a formatted tree to:
    • TXT
    • HTML
    • CSV

I've noticed it's not super easy to get the trees put back out in a clean text-based format and that it would be good to extend on @dylanbeaudette old examples that were in the readme.


Here is a quick sample (modified version of second old example) based on 13th edition keys and an order->subgroup path string.

library(SoilTaxonomy)
library(data.tree)
data("ST_higher_taxa_codes_13th"package = "SoilTaxonomy")
# create ST-style dataset from higher taxa codes
ST13 <- getTaxonAtLevel(ST_higher_taxa_codes_13th$taxon,
                        level = c("order""suborder""greatgroup""subgroup"))
ST13 <- ST13[order(ST_higher_taxa_codes_13th$code),]
ST13 <- ST13[complete.cases(ST13),]
ST13$root <- "Soil Taxonomy (13th Edition)"
ST13$pathString <- with(ST13, paste0(root"/"order"/"suborder"/"greatgroup"/"subgroup))
n <- as.Node(ST13)
print(nlimit = NULL)

Ideas for output:

  • the text output appears to include some unicode markup that should be stripped out.
  • knitr::kable() and results="asis" result in less-than-ideal HTML formatting (needs fixed-width font, other styling)

Good ideas. I remember struggling with trying to balance intuitive vs. compact displays of these data. The hierarchy is "wide" and "shallow" so many of the standards methods for displaying trees break down. It might be best to display each soil order in its own tree. Added some slightly updated examples to misc/.

I wonder if the author of the data.tree package would be open to alternative tree listing styles.

Since I've had such good luck with the maintainer of data.tree in the past, I tried posting some questions / ideas over there:

gluc/data.tree#167

Cool, I have some ideas on that that won't require changes to data.tree.

To remove line numbers I am thinking I can make a subclass of {data.tree} Node that I can dispatch a custom S3 print method on. The print method could most simply cat() out the levelName contents (no line numbers) e.g.:

taxonTree <- function(...) {
# ...
  attr(n, "class") <- c("SoilTaxonNode", attr(n, "class"))
  invisible(n)
}

#' @export
print.SoilTaxonNode <- function(x, ...) {
  # print the tree without rownames
  res <- as.data.frame(x)
  cat(res$levelName, sep = "\n")
}

Getting a little closer to the output from fs::dir_tree() with:

taxonTree(c('palexeralfs', 'rhodoxeralfs'), special.chars = c("\u2502", "\u2514", "\u2500 "))

However, we can't get the exact output without using an additional character (tree.R):

"h" = "\u2500",                   # horizontal
"v" = "\u2502",                   # vertical
"l" = "\u2514",
"j" = "\u251C"

Not sure, but this might require changes in data.tree.

I don't think this particular request requires changes to data.tree. Just a minor change to the print method.

Now this works well, thanks for the suggestion to emulate fs::dir_tree(), I originally was not really going for a direct clone

library(SoilTaxonomy)
taxonTree(c('palexeralfs', 'rhodoxeralfs'), special.chars = c("\u251c","\u2502", "\u2514", "\u2500 "))
#> Loading required namespace: data.tree
#> Soil Taxonomy                           
#>  └─ alfisols                            
#>      └─ xeralfs                         
#>          ├─ rhodoxeralfs                
#>          │   ├─ lithic rhodoxeralfs     
#>          │   ├─ vertic rhodoxeralfs     
#>          │   ├─ petrocalcic rhodoxeralfs
#>          │   ├─ calcic rhodoxeralfs     
#>          │   ├─ inceptic rhodoxeralfs   
#>          │   └─ typic rhodoxeralfs      
#>          └─ palexeralfs                 
#>              ├─ vertic palexeralfs      
#>              ├─ aquandic palexeralfs    
#>              ├─ andic palexeralfs       
#>              ├─ vitrandic palexeralfs   
#>              ├─ fragiaquic palexeralfs  
#>              ├─ aquic palexeralfs       
#>              ├─ petrocalcic palexeralfs 
#>              ├─ lamellic palexeralfs    
#>              ├─ psammentic palexeralfs  
#>              ├─ arenic palexeralfs      
#>              ├─ natric palexeralfs      
#>              ├─ fragic palexeralfs      
#>              ├─ calcic palexeralfs      
#>              ├─ plinthic palexeralfs    
#>              ├─ ultic palexeralfs       
#>              ├─ haplic palexeralfs      
#>              ├─ mollic palexeralfs      
#>              └─ typic palexeralfs

Very cool, thanks. I kind of like this incantation:

taxonTree(c('xerorthents', 'rhodoxeralfs', 'endoaqualfs'), special.chars = c("\u251c","\u2502", "\u2570", "\u2500 "))

It might be nice to pick a unicode output we like as the default.

I was thinking ASCII might be a better default, but the package does use UTF-8 encoding per the DESCRIPTION, so there's no reason we couldn't have that. I like the above suggestion

To finish up this issue I will also need to abstract out the contents of the print() method to capture our transformed result for writing out as CSV and/or HTML