mikemc/speedyseq

`tax_glom()` behaves incorrectly if a tax table or matrix named `new_tax_mat` exists

mikemc opened this issue · 2 comments

When the object new_tax_mat exists in the user's environment then tax_glom() will use it to try to overwrite the correct tax table. If new_tax_mat is a matrix or tax_table and has at least some of the same taxa names as the new merged taxa, then there will not be an error and the resulting tax table will be overwritten; worse, the set of taxa may be pruned to the intersection.

library(speedyseq)
#> Loading required package: phyloseq
#> 
#> Attaching package: 'speedyseq'
#> The following objects are masked from 'package:phyloseq':
#> 
#>     plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom, tip_glom
library(magrittr)

data(GlobalPatterns)

tax_glom(GlobalPatterns, "Phylum") %>% tax_table %>% head
#> Taxonomy Table:     [6 taxa by 7 taxonomic ranks]:
#>        Kingdom    Phylum           Class Order Family Genus Species
#> 245697 "Archaea"  "Crenarchaeota"  NA    NA    NA     NA    NA     
#> 250392 "Archaea"  "Euryarchaeota"  NA    NA    NA     NA    NA     
#> 329744 "Bacteria" "Actinobacteria" NA    NA    NA     NA    NA     
#> 212910 "Bacteria" "Spirochaetes"   NA    NA    NA     NA    NA     
#> 454145 "Bacteria" "MVP-15"         NA    NA    NA     NA    NA     
#> 203274 "Bacteria" "SBR1093"        NA    NA    NA     NA    NA

new_tax_mat <- "asdf"
tax_glom(GlobalPatterns, "Phylum")
#> Error in access(object, "tax_table", errorIfNULL): tax_table slot is empty.

new_tax_mat <- tax_table(GlobalPatterns)
new_tax_mat["245697",] <- "ASDF"
tax_glom(GlobalPatterns, "Phylum") %>% tax_table %>% head
#> Taxonomy Table:     [6 taxa by 7 taxonomic ranks]:
#>        Kingdom    Phylum           Class Order Family Genus Species
#> 245697 "ASDF"     "ASDF"           NA    NA    NA     NA    NA     
#> 250392 "Archaea"  "Euryarchaeota"  NA    NA    NA     NA    NA     
#> 329744 "Bacteria" "Actinobacteria" NA    NA    NA     NA    NA     
#> 212910 "Bacteria" "Spirochaetes"   NA    NA    NA     NA    NA     
#> 454145 "Bacteria" "MVP-15"         NA    NA    NA     NA    NA     
#> 203274 "Bacteria" "SBR1093"        NA    NA    NA     NA    NA

new_tax_mat <- tax_table(GlobalPatterns) %>% as("matrix")
new_tax_mat["245697",] <- "ASDF"
tax_glom(GlobalPatterns, "Phylum") %>% tax_table %>% head
#> Taxonomy Table:     [6 taxa by 7 taxonomic ranks]:
#>        Kingdom    Phylum           Class Order Family Genus Species
#> 245697 "ASDF"     "ASDF"           NA    NA    NA     NA    NA     
#> 250392 "Archaea"  "Euryarchaeota"  NA    NA    NA     NA    NA     
#> 329744 "Bacteria" "Actinobacteria" NA    NA    NA     NA    NA     
#> 212910 "Bacteria" "Spirochaetes"   NA    NA    NA     NA    NA     
#> 454145 "Bacteria" "MVP-15"         NA    NA    NA     NA    NA     
#> 203274 "Bacteria" "SBR1093"        NA    NA    NA     NA    NA

Created on 2020-05-28 by the reprex package (v0.3.0)

The bug is due to this line in merge_taxa_vec(),

if (exists("new_tax_mat"))

As is, the call to exists() will return TRUE if new_tax_mat exists in the environment outside of the function.

This bug also applies to merge_taxa_vec(), tip_glom(), and tree_glom() when the tax_adjust = 0 option is used.

This bug arose on March 8 with commit 576f3a6 and applies to versions 0.1.2.9000 through 0.1.2.9007; it is fixed as of 0.1.2.9008