grunwaldlab/metacoder

parse_phyloseq issues

Closed this issue · 2 comments

Hello! I've been having some issues correctly parsing my phyloseq object so that the node size and color are representative of the abundance of that taxon in my dataset.

The issue is reproducible based on the compressed GlobalPatterns example in issue # 141.

From back then:
#load sample Phyloseq dataset
data("GlobalPatterns")
GP <- GlobalPatterns
GP <- prune_taxa(taxa_sums(GP) > 500, GP)
humantypes <- c("Feces", "Skin") #make it a bit simpler
sample_data(GP)$human <- get_variable(GP, "SampleType") %in% humantypes
mergedGP <- merge_samples(GP, "SampleType")
mergedGP <- rarefy_even_depth(mergedGP,rngseed=394582)
mergedGP <- tax_glom(mergedGP,"Order") #make the tree smaller for the purposes of this example
print(mergedGP) #final phyloseq object

But, if you parse and plot

obj <- parse_phyloseq(mergedGP)
heat_tree(obj,node_label = taxon_names,
node_size = n_obs,
node_color = n_obs )

n_obs only goes up to 121, when there should be taxa with over 20K observations. This leads me to believe that the parsing isn't happening correctly. For my dataset, the same happens. The n_obs is just the number of taxa within a node, but not the abundance. How can I fix this?

Hello!

The issue is that n_obs measures the number of items assigned to a taxon, which in this case is OTUs, so you were plotting the number of OTUs, not the number of reads as you seemed to be expecting. To plot the number of reads, you first need to sum them per-taxon (only per-taxon data can be plotted). Here is a minimal example of how to do it:

library(phyloseq)
library(metacoder)
#> This is metacoder version 0.3.6 (stable)
#> 
#> Attaching package: 'metacoder'
#> The following object is masked from 'package:phyloseq':
#> 
#>     filter_taxa

data("GlobalPatterns")
GP <- GlobalPatterns
GP <- prune_taxa(taxa_sums(GP) > 500, GP)
humantypes <- c("Feces", "Skin") #make it a bit simpler
sample_data(GP)$human <- get_variable(GP, "SampleType") %in% humantypes
mergedGP <- merge_samples(GP, "SampleType")
mergedGP <- rarefy_even_depth(mergedGP,rngseed=394582)
#> `set.seed(394582)` was used to initialize repeatable random subsampling.
#> Please record this for your records so others can reproduce.
#> Try `set.seed(394582); .Random.seed` for the full vector
#> ...
mergedGP <- tax_glom(mergedGP,"Order") #make the tree smaller for the purposes of this example
print(mergedGP) #final phyloseq object
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 131 taxa and 9 samples ]
#> sample_data() Sample Data:       [ 9 samples by 8 sample variables ]
#> tax_table()   Taxonomy Table:    [ 131 taxa by 7 taxonomic ranks ]
#> phy_tree()    Phylogenetic Tree: [ 131 tips and 130 internal nodes ]

obj <- parse_phyloseq(mergedGP)
#> The following 131 of 131 (100%) input indexes have `NA` in their classifications:
#>    1, 2, 3, 4, 5, 6, 7, 8 ... 125, 126, 127, 128, 129, 130, 131


# Sum OTU counts per taxon
obj$data$tax_abund <- calc_taxon_abund(obj, data = 'otu_table')
#> No `cols` specified, so using all numeric columns:
#>    Feces, Freshwater, Freshwater (creek) ... Soil, Tongue
#> Summing per-taxon counts from 9 columns for 212 taxa


heat_tree(obj,node_label = taxon_names,
          node_size = obj$data$tax_abund$Ocean,
          node_color = obj$data$tax_abund$Ocean)

Created on 2023-04-10 with reprex v2.0.2

Note that we needed to use obj$data$tax_abund$Ocean instead of just Ocean because Ocean occurs in multiple tables and heat_tree would not know which to use. Deleting the otu_table table prior to plotting or renaming columns could avoid this issue.

Wonderful response, this is really helpful!

Best, Jose.