qza_to_phyloseq doesn't handle taxonomy.qza generated using Silva
Closed this issue · 7 comments
Hi @jbisanz
qza_to_phyloseq()
only imports correctly taxonomy.qza
generated with Greengenes db, not Silva. Sorry I don't have the object for you but the Taxon
in the latter case is written as, for example:
"D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Ruminococcaceae;D_5__Ruminococcaceae NK4A214 group;D_6__uncultured bacterium"
So lines 38 - 41 in qza_to_phyloseq.R
is not adapted. I can do a PR, if you like.
That would be great if you could! It needs a more elegant way of detecting the separator and handling variable length taxonomic strings, but I have not had the time to address this myself.
Jordan
If you're looking for a simple manual fix you can run:
tax <- data.frame(phyloseq::tax_table(ps)[, 1]) %>%
mutate(Kingdom = stringr::str_replace_all(Kingdom, "D_\\d__", ""))
tax <- tax %>%
tidyr::separate(Kingdom, c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), sep = ";")
tax_mat <- as.matrix(tax)
rownames(tax_mat) <- phyloseq::taxa_names(ps)
phyloseq::tax_table(ps) <- tax_mat
Hi @nebfield
I'm having the same issue, I've tried using your script but I can't get it to work. I get the following error:
`> tax <- data.frame(phyloseq::tax_table(ps)[, 1]) %>%
-
mutate(Kingdom = stringr::str_replace_all(Kingdom, "D_\\d__", ""))
Error in phyloseq::tax_table(ps) : object 'ps' not found`
Is there something I am doing wrong/am I missing a package?
Hi, ps should be replaced with the name of your phyloseq object. Sorry for the confusion.
Thanks @nebfield, sorry I'm new to this.
Running the script:
tax <- data.frame(phyloseq::tax_table(ps)[, 1]) %>%
mutate(Kingdom = stringr::str_replace_all(Kingdom, "D_\d__", ""))
I get the following error:
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
object 'Kingdom' not found
Do you know what is causing this?
It's hard to say. What's the output of
tax <- data.frame(phyloseq::tax_table(ps)[, 1])
colnames(tax)
Hi @nebfield
The output is:
Error in is.data.frame(x) : object 'tax' not found
Am I correct if I use the following script?
taxonomy <- read_qza("taxonomy.qza")
tax_table<-do.call(rbind, strsplit(as.character(taxonomy$data$Taxon), "; "))
tax <- data.frame(phyloseq::tax_table(tax_table)[, 1]) %>%
mutate(Kingdom = stringr::str_replace_all(Kingdom, "D_\d__", ""))
Or have I done something wrong there?